{"paper":{"title":"Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse","license":"http://creativecommons.org/licenses/by/4.0/","headline":"General-purpose AI coding agents handle isolated steps of neuroscience data reformatting but rarely complete error-free end-to-end pipelines.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Kristin Branson, Ling-Qi Zhang","submitted_at":"2026-05-12T23:00:18Z","abstract_excerpt":"Neuroscience data are highly fragmented across labs, formats, and experimental paradigms, and reuse often requires substantial manual effort. A persistent roadblock to data reuse and integration is the need to decipher bespoke and diverse data formatting choices. Common data formats have been proposed in response, but the field continues to struggle with a fundamental tension: formats flexible enough to accommodate diverse experiments are rarely descriptive enough to be self-explanatory, and sufficiently descriptive formats demand detailed documentation and curation effort that few labs can su"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"General-purpose coding agents commonly used by scientists performed well on each sub-task, but rarely strung together a fully error-free end-to-end solution. Agents-as-judges are unreliable at catching errors, especially without ground-truth references.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The eight selected papers and their data formats are representative of the broader challenges in neuroscience data reuse and that success on the decoder-training reformatting task is a good proxy for general data-reuse utility.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"General-purpose AI coding agents handle isolated steps of neuroscience data reformatting but rarely complete error-free end-to-end pipelines.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"0993fc3f2e914c12cba85d70e85cc334ab3d3ccfa88875139982721ac1c3aa8f"},"source":{"id":"2605.12808","kind":"arxiv","version":2},"verdict":{"id":"fbe74d8b-bd07-4db5-a33b-1e36e106b370","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T04:48:25.550428Z","strongest_claim":"General-purpose coding agents commonly used by scientists performed well on each sub-task, but rarely strung together a fully error-free end-to-end solution. Agents-as-judges are unreliable at catching errors, especially without ground-truth references.","one_line_summary":"AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The eight selected papers and their data formats are representative of the broader challenges in neuroscience data reuse and that success on the decoder-training reformatting task is a good proxy for general data-reuse utility.","pith_extraction_headline":"General-purpose AI coding agents handle isolated steps of neuroscience data reformatting but rarely complete error-free end-to-end pipelines."},"references":{"count":95,"sample":[{"doi":"","year":2017,"title":"The hippocampus as a predic- tive map","work_id":"4f3679d3-3563-4c46-b577-ab2d0eb060e6","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Coarse graining, ﬁxed points, and scaling in a large population of neurons","work_id":"02f4c509-ad5b-4891-bc54-e894518449ac","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Space is a latent sequence: A theory of the hippocampus","work_id":"0da8afe7-37bc-4573-a487-3e6e2f47d29c","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"A uniﬁed, scalable framework for neural population decoding","work_id":"4f32015b-8613-432a-99d7-3bbc1a57f688","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Foundation model of neural activity predicts response to new stimulus types","work_id":"abd68277-402a-4f7c-bce7-e6050fa9de0a","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":95,"snapshot_sha256":"4b8e5c69bedb841af035e7530c1a1fb741c65034c0e9726f875e83c53ec0ea67","internal_anchors":1},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}