Title” is the title of the source case report that the question-answer pair was derived from, “pmc id

URLhttps://arxiv · 2024 · DOI 10.1038/s41586-025-10097-9

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows

cs.CV · 2026-03-25 · conditional · novelty 8.0

MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.

An agentic framework for gravitational-wave counterpart association in the multi-messenger era

astro-ph.IM · 2026-05-11 · unverdicted · novelty 7.0

GW-Eyes is a new LLM-powered agent framework that autonomously associates gravitational-wave events with electromagnetic counterparts by integrating specialized tools and supporting natural-language interaction.

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

OGCaReBench is a new retrieval-focused benchmark for evaluating LLMs on off-guideline clinical questions from real case reports, showing retrieval augmentation raises accuracy from 56% to 82%.

Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.

TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection

cs.AI · 2026-04-12 · unverdicted · novelty 6.0

TrajOnco uses a chain-of-agents LLM architecture with memory to perform temporal reasoning on longitudinal EHR, achieving 0.64-0.80 AUROC for 1-year multi-cancer risk prediction in zero-shot mode on matched cohorts while matching supervised ML on lung cancer and outperforming single-agent baselines.

citing papers explorer

Showing 2 of 2 citing papers after filters.

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering cs.CL · 2026-05-20 · unverdicted · none · ref 4
OGCaReBench is a new retrieval-focused benchmark for evaluating LLMs on off-guideline clinical questions from real case reports, showing retrieval augmentation raises accuracy from 56% to 82%.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments cs.CL · 2026-05-05 · unverdicted · none · ref 81
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.

Title” is the title of the source case report that the question-answer pair was derived from, “pmc id

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer