Title resolution pending

Daoud, M · 2025 · arXiv 2505.03427

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

QIMMA produces a validated multi-domain Arabic LLM benchmark of 52k samples by systematically detecting and correcting quality issues in prior resources via LLM-assisted and human review.

MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction

cs.CL · 2026-06-24 · unverdicted · novelty 6.0

MedGuards introduces a multi-agent in-context learning framework for medical error detection and correction plus the KPCS metric, reporting improvements on four multilingual clinical note datasets.

Evaluation of Small Language Models for Arabic Language Processing

cs.CL · 2026-06-19 · unverdicted · novelty 6.0

Gemma 3 (12B) scores highest on a new Arabic benchmark, with Arabic alignment and instruction following mattering more than model size.

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

CLR-voyance reformulates inpatient reasoning as POMDP with clinician-validated outcome rubrics, yielding an 8B model that outperforms larger frontier models on the authors' new benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation cs.CL · 2026-04-03 · unverdicted · none · ref 4
QIMMA produces a validated multi-domain Arabic LLM benchmark of 52k samples by systematically detecting and correcting quality issues in prior resources via LLM-assisted and human review.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer