Title resolution pending

Health system-scale language models are all-purpose prediction engines , volume = · 2023 · DOI 10.1038/s41586-023-06160-y

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.

Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

Adding medically insignificant features to prompts causes statistically significant increases in mean predicted hospitalization risk and output variability across four LLMs and four prompt styles on synthetic patient profiles.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures cs.CL · 2026-05-15 · unverdicted · none · ref 20
PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.
Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores cs.LG · 2026-04-23 · unverdicted · none · ref 4
Adding medically insignificant features to prompts causes statistically significant increases in mean predicted hospitalization risk and output variability across four LLMs and four prompt styles on synthetic patient profiles.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer