PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Adding medically insignificant features to prompts causes statistically significant increases in mean predicted hospitalization risk and output variability across four LLMs and four prompt styles on synthetic patient profiles.
citing papers explorer
-
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures
PQR framework generates diverse realistic queries to elicit QA agent failures, uncovering 23-78% more unhelpful responses than prior methods in e-commerce agent tests.
-
Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores
Adding medically insignificant features to prompts causes statistically significant increases in mean predicted hospitalization risk and output variability across four LLMs and four prompt styles on synthetic patient profiles.