hub Canonical reference

Knowledge-Centric Hallucination Detection

Xiangkun Hu, Dongyu Ru, Lin Qiu, Qipeng Guo, Tianhang Zhang, Yang Xu, Yun Luo, Pengfei Liu, Yue Zhang, Zheng Zhang · 2024 · DOI 10.18653/v1/2024

Canonical reference. 77% of citing Pith papers cite this work as background.

87 Pith papers citing it

Background 77% of classified citations

open at publisher browse 87 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 27 method 2 dataset 1

citation-polarity summary

background 23 support 2 unclear 2 use method 2 use dataset 1

co-cited works

representative citing papers

Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts

cs.CV · 2026-05-12 · unverdicted · novelty 8.0 · 2 refs

An MLLM-guided architecture with a mixture of frequency experts and relational alignment loss achieves state-of-the-art all-in-one image restoration, outperforming prior methods by up to 1.35 dB on the CDD11 dataset.

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

ToolMerge decomposes queries into LLM-planned tool calls merged by boolean operators for long-video keyframe retrieval and introduces the M2M benchmark, showing competitive results with 5% gains on caption retrieval.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

cs.DC · 2026-05-18 · unverdicted · novelty 7.0

PopPy combines an ahead-of-time compiler and runtime to extract parallelism from Python compound AI applications, delivering up to 6.4x end-to-end speedups while preserving sequential semantics.

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.

From Table to Cell: Attention for Better Reasoning with TABALIGN

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.

Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

DIPS fine-tunes LLMs to output ordered feasible decision vectors approximating Pareto fronts for constrained bi-objective convex problems, reaching 95-98% normalized hypervolume with 0.16s inference.

SMT-Based Active Learning of Weighted Automata

cs.FL · 2026-05-08 · unverdicted · novelty 7.0

An SMT-based active learning algorithm learns minimal nondeterministic weighted automata over arbitrary semirings, with partial correctness proofs, a sufficient termination condition, and experiments showing smaller models and fewer queries than baselines.

The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.

Deep Graph-Language Fusion for Structure-Aware Code Generation

cs.SE · 2026-05-05 · unverdicted · novelty 7.0

CGFuse enables deep token-level fusion of graph-derived structural features into language models, yielding 10-16% BLEU and 6-11% CodeBLEU gains on code generation tasks.

Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference

cs.LG · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

Two calls per example identify the first two moments of latent correctness probability, enabling exact bounds on the vote-accuracy curve for any majority-vote budget under conditional i.i.d. assumptions.

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

cs.CR · 2026-04-30 · unverdicted · novelty 7.0

VOW formulates LLM watermark detection as a secure two-party computation using a Verifiable Oblivious Pseudorandom Function to achieve private and cryptographically verifiable detection.

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

cs.IR · 2026-04-29 · unverdicted · novelty 7.0

ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.

Factual and Edit-Sensitive Graph-to-Sequence Generation via Graph-Aware Adaptive Noising

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

DLM4G applies graph-aware adaptive noising in a diffusion framework to generate text from graphs, outperforming larger autoregressive and diffusion baselines in factual grounding and edit sensitivity on three datasets plus molecule captioning.

Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

cs.DB · 2026-04-17 · unverdicted · novelty 7.0

A survey of 55 agentic VA systems proposes a co-evolutionary framework defining four agent roles (PLANNER, CREATOR, REVIEWER, CONTEXT MANAGER) mapped to visual analytics pipeline stages along with design guidelines.

DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

DetailVerifyBench supplies 1,000 images and densely annotated long captions to evaluate precise hallucination localization in multimodal large language models.

DP-OPD: Differentially Private On-Policy Distillation for Language Models

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

DP-OPD achieves lower perplexity than DP fine-tuning and synthesis-based private distillation under ε=2.0 by enforcing DP-SGD solely on the student during on-policy training with a frozen teacher.

Spectral Tempering for Embedding Compression in Dense Passage Retrieval

cs.IR · 2026-03-19 · unverdicted · novelty 7.0

Spectral Tempering derives an adaptive scaling factor γ(k) from the embedding eigenspectrum via local SNR analysis and knee-point normalization to achieve near-optimal compression without training or validation.

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya

cs.AI · 2026-02-14 · conditional · novelty 7.0

Fine-tuning LLMs on Navya-Nyaya's six-phase reasoning structure yields 100% semantic correctness on held-out logical problems despite only 40% strict format adherence.

DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack

cs.CR · 2025-12-18 · unverdicted · novelty 7.0

DualGuard uses adaptive dual-stream watermark signals to detect and trace both paraphrase and spoofing attacks in LLM outputs while preserving text quality.

Bayesian Social Deduction with Graph-Informed Language Models

cs.AI · 2025-06-21 · unverdicted · novelty 7.0

Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.

Graph Alignment Topology as an Inductive Bias for Grounding Detection

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

A GNN trained on bipartite alignment graphs between references and LLM generations reports state-of-the-art hallucination detection across four datasets, beating prior methods and GPT-4o.

TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

TRACE uses cross-layer candidate trajectories inside frozen LLMs to dynamically select and apply one of three correction operators, delivering mean gains of +12.26 MC1 and +8.65 MC2 points across 15 models and 3 benchmarks with no regressions.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems cs.AI · 2026-05-12 · unreviewed · ref 31
ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models cs.CL · 2026-05-11 · unreviewed · ref 13 · 2 links
Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs cs.CR · 2026-04-20 · unreviewed · ref 3
Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation cs.SI · 2025-10-13 · unreviewed · ref 10

Knowledge-Centric Hallucination Detection

hub tools

citation-role summary

citation-polarity summary

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer