{"paper":{"title":"LLM Jaggedness Unlocks Scientific Creativity","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Uneven capabilities across LLMs allow model combinations to generate more scientific ideas than any single model.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Esther H. R. Tsai, J. Anibal Boscoboinik, Kevin G. Yager, Shray Mathur","submitted_at":"2026-05-11T13:47:48Z","abstract_excerpt":"As artificial intelligence advances, models are not improving uniformly. Instead, progress unfolds in a jagged fashion, with capabilities growing unevenly across tasks, domains, and model scales. In this work, we examine this dynamic jaggedness through the lens of scientific idea generation. We introduce SciAidanBench, a benchmark of open-ended scientific questions designed to measure the scientific creativity of large language models (LLMs). Given a scientific question, models are asked to generate as many unique and coherent ideas as possible, with the total number of valid responses serving"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"we show that this jaggedness can be harnessed. We explore mechanisms of inference-time compute, knowledge pooling, and brainstorming to combine models effectively and construct meta-model ensembles that outperform any single model.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the total number of valid responses is a reliable proxy for creative potential, and that human or automated judgment of uniqueness and coherence accurately captures scientific creativity without systematic bias.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LLMs exhibit jagged scientific creativity across models, prompts, and domains, and this unevenness can be leveraged via model ensembles to outperform any single model on idea generation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Uneven capabilities across LLMs allow model combinations to generate more scientific ideas than any single model.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e619787ddb22637bc360a125dd716fb789188dbf97233fb69c627ad6566f93ee"},"source":{"id":"2605.10574","kind":"arxiv","version":2},"verdict":{"id":"011f6e7f-8ad4-40b8-bb4a-ad3d4b38a771","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T04:03:19.429350Z","strongest_claim":"we show that this jaggedness can be harnessed. We explore mechanisms of inference-time compute, knowledge pooling, and brainstorming to combine models effectively and construct meta-model ensembles that outperform any single model.","one_line_summary":"LLMs exhibit jagged scientific creativity across models, prompts, and domains, and this unevenness can be leveraged via model ensembles to outperform any single model on idea generation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the total number of valid responses is a reliable proxy for creative potential, and that human or automated judgment of uniqueness and coherence accurately captures scientific creativity without systematic bias.","pith_extraction_headline":"Uneven capabilities across LLMs allow model combinations to generate more scientific ideas than any single model."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.10574/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-20T05:42:00.897549Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T14:41:06.809597Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T11:01:17.674593Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T09:10:30.957191Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"dd350d16aece251404b101e6088ffdcd522dce9507687c2833a9b2916b7257b0"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}