{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:TXZZOX6HIQM2T2SRD4LEUKCH7Q","short_pith_number":"pith:TXZZOX6H","schema_version":"1.0","canonical_sha256":"9df3975fc74419a9ea511f164a2847fc16168471c8c5f85693b29d1d14cd6bef","source":{"kind":"arxiv","id":"2506.07044","version":4},"attestation_state":"computed","paper":{"title":"Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Lingshu, a medical multimodal model, outperforms open-source peers on visual QA, text QA, and report generation after targeted data curation and staged training.","cross_cats":["cs.AI","cs.CV"],"primary_cat":"cs.CL","authors_text":"Chaojun Wang, Chaoqun Liu, Chenghao Xiao, Deli Zhao, Guizhen Chen, Hao Zhang, Hou Pong Chan, Jianyu Wang, Jie Tan, Junao Shen, LASA Team, Long Li, Mahani Aljunied, Ruifeng Yuan, Tingyang Xu, Weiwen Xu, Yu Rong, Yu Sun, Zhaodonghui Li","submitted_at":"2025-06-08T08:47:30Z","abstract_excerpt":"Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in understanding common visual elements, largely due to their large-scale datasets and advanced training strategies. However, their effectiveness in medical applications remains limited due to the inherent discrepancies between data and tasks in medical scenarios and those in the general domain. Concretely, existing medical MLLMs face the following critical limitations: (1) limited coverage of medical knowledge beyond imaging, (2) heightened susceptibility to hallucinations due to suboptimal data curation proces"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2506.07044","kind":"arxiv","version":4},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2025-06-08T08:47:30Z","cross_cats_sorted":["cs.AI","cs.CV"],"title_canon_sha256":"6b6ef6f5d5401cee547e3c2a4f28813c735544e11425063d383145f42c3a8f48","abstract_canon_sha256":"226cc92a595c243d9368b89129df002f3da8ed9c2b3bb3629f3562c4024a07dd"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:52.714235Z","signature_b64":"wJhNvdwjoGgllMRFecCpCTNzYi+rAPVmB/kw8zndhdy6Re0pHTzi1Rad80pXDfP8w59WPa/zbv4yOWJdUY39DA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"9df3975fc74419a9ea511f164a2847fc16168471c8c5f85693b29d1d14cd6bef","last_reissued_at":"2026-05-17T23:38:52.713597Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:52.713597Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Lingshu, a medical multimodal model, outperforms open-source peers on visual QA, text QA, and report generation after targeted data curation and staged training.","cross_cats":["cs.AI","cs.CV"],"primary_cat":"cs.CL","authors_text":"Chaojun Wang, Chaoqun Liu, Chenghao Xiao, Deli Zhao, Guizhen Chen, Hao Zhang, Hou Pong Chan, Jianyu Wang, Jie Tan, Junao Shen, LASA Team, Long Li, Mahani Aljunied, Ruifeng Yuan, Tingyang Xu, Weiwen Xu, Yu Rong, Yu Sun, Zhaodonghui Li","submitted_at":"2025-06-08T08:47:30Z","abstract_excerpt":"Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in understanding common visual elements, largely due to their large-scale datasets and advanced training strategies. However, their effectiveness in medical applications remains limited due to the inherent discrepancies between data and tasks in medical scenarios and those in the general domain. Concretely, existing medical MLLMs face the following critical limitations: (1) limited coverage of medical knowledge beyond imaging, (2) heightened susceptibility to hallucinations due to suboptimal data curation proces"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The results show that Lingshu consistently outperforms the existing open-source multimodal models on most tasks across multimodal QA, text-based QA, and medical report generation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the synthesized captions, VQA pairs, and reasoning samples produced by the data curation procedure are accurate and free of hallucinations or factual errors that would propagate into the model.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Lingshu is a medical-specialized multimodal LLM that outperforms prior open-source models on multimodal QA, text QA, and report generation after training on a large curated dataset of medical knowledge.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Lingshu, a medical multimodal model, outperforms open-source peers on visual QA, text QA, and report generation after targeted data curation and staged training.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"f558e02e41627c9200808f56505635e983a180b5fbc30d6d27079d0efa89fed9"},"source":{"id":"2506.07044","kind":"arxiv","version":4},"verdict":{"id":"9ed62906-f524-4a69-9728-99782ddecbb8","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T11:00:52.025848Z","strongest_claim":"The results show that Lingshu consistently outperforms the existing open-source multimodal models on most tasks across multimodal QA, text-based QA, and medical report generation.","one_line_summary":"Lingshu is a medical-specialized multimodal LLM that outperforms prior open-source models on multimodal QA, text QA, and report generation after training on a large curated dataset of medical knowledge.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the synthesized captions, VQA pairs, and reasoning samples produced by the data curation procedure are accurate and free of hallucinations or factual errors that would propagate into the model.","pith_extraction_headline":"Lingshu, a medical multimodal model, outperforms open-source peers on visual QA, text QA, and report generation after targeted data curation and staged training."},"references":{"count":95,"sample":[{"doi":"10.1016/j.artmed.2024.103001","year":2024,"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","ref_index":1,"cited_arxiv_id":"2501.12948","is_internal_anchor":true},{"doi":"","year":2003,"title":"Meddr: Diagnosis-guided bootstrapping for large-scale medical vision-language learning.arXiv preprint arXiv:2404.15127","work_id":"207c60b0-9d88-4800-98ba-632338b44498","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Maira-1: A specialisedlargemultimodalmodelforradiologyreportgeneration.arXivpreprintarXiv:2311.13668","work_id":"ef50f610-9e4e-4d5b-8c85-35e180f09d5b","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1609/aaai.v33i01.3301590","year":2021,"title":"https://doi.org/10.1609/aaai.v33i01.3301590. Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven Truong, Du Nguyen Duong Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew Lungren, Andrew Ng,","work_id":"616e2721-84a1-4acc-9d4a-b69c0f37397e","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1038/s41597-019-0322-0","year":2076,"title":"URLhttps://www.nature.com/articles/s41597-019-0322-0","work_id":"b5ff10f9-da4b-488f-8068-51b18358da75","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":95,"snapshot_sha256":"625d29f6688fcb73cbefc337e2e0e611cea9e0cad003042292aafc160f11a820","internal_anchors":6},"formal_canon":{"evidence_count":3,"snapshot_sha256":"b9f2ac25b0da3f84be2bb4611eaa8ff55abd71b1edf15f8ddef8e4924ad8411b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2506.07044","created_at":"2026-05-17T23:38:52.713707+00:00"},{"alias_kind":"arxiv_version","alias_value":"2506.07044v4","created_at":"2026-05-17T23:38:52.713707+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2506.07044","created_at":"2026-05-17T23:38:52.713707+00:00"},{"alias_kind":"pith_short_12","alias_value":"TXZZOX6HIQM2","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"TXZZOX6HIQM2T2SR","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"TXZZOX6H","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":38,"internal_anchor_count":38,"sample":[{"citing_arxiv_id":"2605.22872","citing_title":"MedExpMem: Adapting Experience Memory for Differential Diagnosis","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23629","citing_title":"DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21652","citing_title":"Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21852","citing_title":"Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding","ref_index":131,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22080","citing_title":"JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22414","citing_title":"Towards Clinically Interpretable Ophthalmic VQA via Spatially-Grounded Lesion Evidence","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09450","citing_title":"ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion","ref_index":52,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20277","citing_title":"Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20772","citing_title":"VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2508.12778","citing_title":"HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2510.11423","citing_title":"Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2510.26083","citing_title":"Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2512.22278","citing_title":"FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2601.03054","citing_title":"IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2602.12705","citing_title":"MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs","ref_index":62,"is_internal_anchor":true},{"citing_arxiv_id":"2603.06665","citing_title":"Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2603.13779","citing_title":"AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08559","citing_title":"Medical Reasoning with Large Language Models: A Survey and MR-Bench","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2603.18545","citing_title":"CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2603.20698","citing_title":"Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12650","citing_title":"CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2604.10755","citing_title":"MMRareBench: A Rare-Disease Multimodal and Multi-Image Medical Benchmark","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11931","citing_title":"Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08787","citing_title":"Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2604.25296","citing_title":"Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs","ref_index":36,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q","json":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q.json","graph_json":"https://pith.science/api/pith-number/TXZZOX6HIQM2T2SRD4LEUKCH7Q/graph.json","events_json":"https://pith.science/api/pith-number/TXZZOX6HIQM2T2SRD4LEUKCH7Q/events.json","paper":"https://pith.science/paper/TXZZOX6H"},"agent_actions":{"view_html":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q","download_json":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q.json","view_paper":"https://pith.science/paper/TXZZOX6H","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2506.07044&json=true","fetch_graph":"https://pith.science/api/pith-number/TXZZOX6HIQM2T2SRD4LEUKCH7Q/graph.json","fetch_events":"https://pith.science/api/pith-number/TXZZOX6HIQM2T2SRD4LEUKCH7Q/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q/action/timestamp_anchor","attest_storage":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q/action/storage_attestation","attest_author":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q/action/author_attestation","sign_citation":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q/action/citation_signature","submit_replication":"https://pith.science/pith/TXZZOX6HIQM2T2SRD4LEUKCH7Q/action/replication_record"}},"created_at":"2026-05-17T23:38:52.713707+00:00","updated_at":"2026-05-17T23:38:52.713707+00:00"}