Wildhallucinations: Evaluating long-form factuality in llms with real-world entity queries

Wenting Zhao, Tanya Goyal, Yu-Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, et al · 2024 · arXiv 2407.17468

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

A normative-descriptive framework shows LLMs' tool-calling perceptions misalign with true need/utility for web search, and hidden-state estimators improve decisions over self-perceived baselines.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generations

cs.CL · 2025-05-29 · unverdicted · novelty 6.0

LoVeC uses RL to train LLMs to output verbalized numerical confidence scores for statements in long-form text, achieving better calibration than self-consistency baselines on QA datasets while being 20x faster.

Per-Entity Bias Mapping for AI Visibility: Why Brand Mentions Require Entity-Specific Calibration

cs.CL · 2026-06-19 · unverdicted · novelty 4.0

Per-Entity Bias Mapping claims aggregate visibility metrics fail because large brands exhibit higher fabricated citation rates than smaller ones in AI responses, attributed to the Brand Hallucination Paradox.

citing papers explorer

Showing 1 of 1 citing paper after filters.

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling cs.AI · 2026-05-01 · unverdicted · none · ref 54
A normative-descriptive framework shows LLMs' tool-calling perceptions misalign with true need/utility for web search, and hidden-state estimators improve decisions over self-perceived baselines.

Wildhallucinations: Evaluating long-form factuality in llms with real-world entity queries

fields

years

verdicts

representative citing papers

citing papers explorer