{"paper":{"title":"ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery","license":"http://creativecommons.org/licenses/by/4.0/","headline":"An artifact graph of models and datasets lets graph methods rank untested performance links to find new SOTA results.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Bodhisattwa Prasad Majumder, Haofei Yu, Jiaxuan You, Kyle Richardson, Peter Clark","submitted_at":"2026-05-16T09:26:08Z","abstract_excerpt":"Scientific artifacts such as models and datasets are foundations for research. With the rapid growth of platforms like HuggingFace, researchers now have access to a large number of artifacts. Yet, a key challenge remains: how can we automatically discover the state-of-the-art (SOTA) model for a given dataset by fully leveraging existing artifacts? We formalize this task as automatic SOTA discovery by modeling HuggingFace as an artifact graph, where nodes are models/datasets and edges represent evaluations. We propose ArtifactLinker, a two-stage framework: (1) ranking promising unobserved model"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Graph structures between existing artifacts are effective for missing link prediction; end-to-end ranking and verification with ArtifactLinker help discover potential SOTA results and research insights.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"Existing published evaluations form a sufficiently connected and informative graph that unobserved model-dataset performance links can be ranked accurately from graph structure alone.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ArtifactLinker frames SOTA discovery as missing-link prediction on an artifact graph of models and datasets, with a two-stage ranking-plus-verification pipeline and a new benchmark of 14k artifacts.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An artifact graph of models and datasets lets graph methods rank untested performance links to find new SOTA results.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"5f0405dc064889e85e5f026b7a98883dfdcdd6131d6c6d9f1d83c2d97dee0037"},"source":{"id":"2605.16902","kind":"arxiv","version":1},"verdict":{"id":"13589376-5b49-4df8-ace7-526e676a77bd","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T20:25:21.390548Z","strongest_claim":"Graph structures between existing artifacts are effective for missing link prediction; end-to-end ranking and verification with ArtifactLinker help discover potential SOTA results and research insights.","one_line_summary":"ArtifactLinker frames SOTA discovery as missing-link prediction on an artifact graph of models and datasets, with a two-stage ranking-plus-verification pipeline and a new benchmark of 14k artifacts.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"Existing published evaluations form a sufficiently connected and informative graph that unobserved model-dataset performance links can be ranked accurately from graph structure alone.","pith_extraction_headline":"An artifact graph of models and datasets lets graph methods rank untested performance links to find new SOTA results."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16902/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"cited_work_retraction","ran_at":"2026-05-19T20:52:23.353004Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T20:31:40.485435Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T20:31:19.107673Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T18:41:56.275794Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:26.354878Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"77d515ec8e0789caefe000bb2cc5d267b28aef9eda42632b6d60f958dfc53000"},"references":{"count":33,"sample":[{"doi":"","year":null,"title":"On the suitability of hug- ging face hub for empirical studies.ArXiv, abs/2307.14841,","work_id":"2107051f-40e9-47ec-bad4-f1ff041d5889","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Evaluating sakana's AI scientist: Bold claims, mixed results","work_id":"39f280b4-8df6-4383-811e-2241b7141020","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2015,"title":"A large annotated corpus for learning natural language inference","work_id":"612282f5-76dd-459e-817d-c47c0485d1ff","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Analyzing the evolution and maintenance of ml models on hugging face.2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR), pp","work_id":"62ebf373-d5b9-45ac-949d-cedf5b0946d8","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Joel Castaño, Rafael Cabañas, Antonio Salmer’on, David Lo, and Silverio Mart’inez- Fern’andez","work_id":"a5ee127c-226e-4e62-b6cf-62edd4528dd9","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":33,"snapshot_sha256":"edca1a8fa8f6647435b47144f8da7c7d0142ca19a0a845056b5597fd88ba404c","internal_anchors":5},"formal_canon":{"evidence_count":2,"snapshot_sha256":"71a84f7aad4957e709a56b79971f8a06c7edb256aeecd160d863a9c18f332cb4"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}