ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
Pefomed: Parameter efficient fine-tuning on multimodal large language models for medical visual question answering
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.
citing papers explorer
-
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
ErrorRadar is a new benchmark of 2,500 multimodal K-12 math problems for MLLM error step identification and categorization, where GPT-4o trails human experts by ~10%.
-
Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation
Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.