TaskMCQA in English and Spanish

The English translation was performed using GPT-4, the open-ended version was created via rephrasing with Qwen2

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

Medmarks introduces 30 open benchmarks for medical LLM tasks and evaluates 61 models, finding frontier reasoning models lead while medically fine-tuned ones outperform generalists and all show answer-order bias.

citing papers explorer

Showing 1 of 1 citing paper.

Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks cs.CL · 2026-05-02 · unverdicted · none · ref 11
Medmarks introduces 30 open benchmarks for medical LLM tasks and evaluates 61 models, finding frontier reasoning models lead while medically fine-tuned ones outperform generalists and all show answer-order bias.

TaskMCQA in English and Spanish

fields

years

verdicts

representative citing papers

citing papers explorer