GPT-4 exceeds the USMLE passing score by more than 20 points and outperforms both GPT-3.5 and the medically fine-tuned Med-PaLM on the MultiMedQA benchmarks.
Learning to complement humans
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
In a competitive QA game, humans under-rely on correct AI suggestions 3.9% of the time and over-rely on incorrect ones 1.7% of the time, driven by confirmation bias and near-chance AI confidence when answers disagree.
Conditional risk calibration reduces to standard regression and is distinct from probability calibration.
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.
citing papers explorer
-
Capabilities of GPT-4 on Medical Challenge Problems
GPT-4 exceeds the USMLE passing score by more than 20 points and outperforms both GPT-3.5 and the medically fine-tuned Med-PaLM on the MultiMedQA benchmarks.
-
AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?
In a competitive QA game, humans under-rely on correct AI suggestions 3.9% of the time and over-rely on incorrect ones 1.7% of the time, driven by confirmation bias and near-chance AI confidence when answers disagree.
-
Calibrating conditional risk
Conditional risk calibration reduces to standard regression and is distinct from probability calibration.
-
Medical Model Synthesis Architectures: A Case Study
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
-
Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows
Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.