Learning to complement humans

The impact of large language models in finance: Towards trustworthy adoption · 2024 · arXiv 2005.00582

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Capabilities of GPT-4 on Medical Challenge Problems

cs.CL · 2023-03-20 · unverdicted · novelty 7.0

GPT-4 exceeds the USMLE passing score by more than 20 points and outperforms both GPT-3.5 and the medically fine-tuned Med-PaLM on the MultiMedQA benchmarks.

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

In a competitive QA game, humans under-rely on correct AI suggestions 3.9% of the time and over-rely on incorrect ones 1.7% of the time, driven by confirmation bias and near-chance AI confidence when answers disagree.

Calibrating conditional risk

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

Conditional risk calibration reduces to standard regression and is distinct from probability calibration.

Medical Model Synthesis Architectures: A Case Study

cs.AI · 2026-05-10 · unverdicted · novelty 5.0

MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.

Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows

cs.CY · 2026-04-29 · unverdicted · novelty 5.0

Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Capabilities of GPT-4 on Medical Challenge Problems cs.CL · 2023-03-20 · unverdicted · none · ref 20
GPT-4 exceeds the USMLE passing score by more than 20 points and outperforms both GPT-3.5 and the medically fine-tuned Med-PaLM on the MultiMedQA benchmarks.
AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering? cs.AI · 2026-05-27 · unverdicted · none · ref 3
In a competitive QA game, humans under-rely on correct AI suggestions 3.9% of the time and over-rely on incorrect ones 1.7% of the time, driven by confirmation bias and near-chance AI confidence when answers disagree.
Calibrating conditional risk cs.LG · 2026-04-22 · unverdicted · none · ref 19
Conditional risk calibration reduces to standard regression and is distinct from probability calibration.
Medical Model Synthesis Architectures: A Case Study cs.AI · 2026-05-10 · unverdicted · none · ref 291
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows cs.CY · 2026-04-29 · unverdicted · none · ref 75
Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.

Learning to complement humans

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer