Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

Blake Richards; Jiook Cha; Mehdi Azabou; Sangyoon Bae

arxiv: 2510.18516 · v3 · submitted 2025-10-21 · 🧬 q-bio.NC · cs.LG

Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

Sangyoon Bae , Mehdi Azabou , Blake Richards , Jiook Cha This is my paper

Pith reviewed 2026-05-18 05:19 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LG

keywords calcium imagingself-supervised learningneural decodingvisual experiencecell heterogeneitypretraining curriculumAllen Brain Observatory

0 comments

The pith

Pretraining first on statistically regular neurons identified by skewness and kurtosis improves decoding of dynamic visual experience from calcium imaging by 12-13 percent and supports smooth model scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural recordings contain a mix of statistically regular neurons and highly stochastic ones that respond variably to the same stimuli. This mix destabilizes self-supervised representation learning and prevents models from scaling reliably. The paper introduces a two-stage curriculum called POYO-CAP that first applies masked reconstruction and auxiliary supervision only to the regular subset, then fine-tunes on the full population. On the Allen Brain Observatory dataset the approach produces 12-13 percent relative gains over training from scratch on mixed data and yields monotonic performance gains as model size increases. By treating statistical predictability as an explicit selection criterion the method converts cell heterogeneity from a liability into an advantage for neural decoding tasks.

Core claim

POYO-CAP first trains with masked reconstruction plus lightweight auxiliary supervision on statistically regular neurons identified via skewness and kurtosis and then fine-tunes on more stochastic populations, yielding 12-13 percent relative improvements over from-scratch training and enabling smooth monotonic scaling with model size on the Allen Brain Observatory dataset.

What carries the argument

Cell-pattern Aware Pretraining (POYO-CAP), a hybrid curriculum that partitions neurons by skewness and kurtosis to pretrain exclusively on the statistically regular subset before exposure to the full heterogeneous population.

Load-bearing premise

Neurons can be reliably partitioned into statistically regular versus stochastic groups using skewness and kurtosis, and that pretraining exclusively on the regular subset creates a foundation that improves final performance on the full heterogeneous population.

What would settle it

A direct comparison showing that the two-stage curriculum produces equal or lower decoding accuracy than training from scratch on the full unpartitioned population.

read the original abstract

Neural recordings exhibit a distinctive form of heterogeneity rooted in differences in cell types, intrinsic circuit dynamics, and stochastic stimulus-response variability that goes beyond ordinary dataset variability, mixing statistically regular neurons with highly stochastic, stimulus-contingent ones within the same dataset. This heterogeneity poses a challenge for self-supervised learning (SSL) -- learnable statistical regularity -- thereby destabilizing representation learning and limiting reliable scaling. We introduce POYO-CAP (Cell-pattern Aware Pretraining), a biologically grounded hybrid pretraining strategy that first trains with masked reconstruction plus lightweight auxiliary supervision on statistically regular neurons -- identified via skewness and kurtosis -- and then fine-tunes on more stochastic populations. On the Allen Brain Observatory dataset, this curriculum yields 12--13\% relative improvements over from-scratch training and enables smooth, monotonic scaling with model size, whereas baselines trained on mixed populations plateau or destabilize. By making statistical predictability an explicit data-selection criterion, POYO-CAP turns neural heterogeneity into a scalable learning advantage for robust neural decoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is a two-stage SSL curriculum that pretrains on calcium-imaging neurons selected for low skewness and kurtosis before fine-tuning on the rest, claiming 12-13% gains and better scaling on Allen data.

read the letter

The punchline is that this curriculum approach to handling mixed regular and stochastic neurons in calcium recordings produces measurable gains over from-scratch training and avoids the plateauing seen in mixed-population baselines. The explicit use of skewness and kurtosis to pick the initial pretraining subset is the clearest new element relative to standard SSL pipelines for neural data. The paper does a solid job laying out why heterogeneity in cell types and response variability is a real obstacle for representation learning and then showing that ordering the data by statistical regularity can turn that into an advantage for decoding tasks. The reported scaling behavior with model size is also a useful empirical observation if it holds up under closer inspection. The soft spots are mostly around validation and transparency. The abstract gives percentage improvements without error bars, statistical tests, or clear data-split details, and it does not show that the skewness/kurtosis groups actually differ on independent measures like trial-to-trial variability or stimulus information. If those checks are missing from the full paper, the gains could trace to reduced data volume or training schedule rather than the biological grounding claimed. The weakest assumption is that the selected neurons are meaningfully more SSL-friendly; without that evidence the method risks being a heuristic that works on this dataset but lacks a clear reason to generalize. This is for researchers working on scaling self-supervised methods to noisy biological recordings, especially those already using the Allen Brain Observatory or similar large-scale imaging sets. A reader looking for practical curriculum tricks in neural decoding would find it worth reading. It deserves a serious referee because the core idea is simple enough to test and the scaling result, if reproducible, would be of interest to the subfield. I would send it to review with requests for the missing ablations and group-validation metrics.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes POYO-CAP, a hybrid pretraining curriculum for self-supervised learning on calcium imaging recordings. Neurons are partitioned into statistically regular versus stochastic groups using skewness and kurtosis thresholds; masked reconstruction plus auxiliary supervision is performed first on the regular subset, followed by fine-tuning on the full heterogeneous population. On the Allen Brain Observatory dataset the method is reported to deliver 12-13% relative gains over from-scratch baselines and to produce smooth monotonic scaling with model size, while mixed-population training plateaus or destabilizes.

Significance. If the neuron-partitioning criterion is shown to be biologically meaningful and the performance gains prove robust to controls, the work would offer a concrete, biologically motivated strategy for turning cell-type and response heterogeneity into an advantage for scalable representation learning in neural decoding tasks.

major comments (2)

[Abstract] Abstract: the central claim of 12-13% relative improvement and stable scaling rests on high-level empirical assertions that lack error bars, statistical significance tests, cross-validation details, or ablation results comparing the skewness/kurtosis partition against random or alternative selection criteria.
[Abstract] Abstract: no evidence is supplied that the skewness/kurtosis-selected neurons exhibit lower trial-to-trial variability, higher stimulus mutual information, or lower masked-reconstruction loss than the complementary stochastic group; without such validation the reported gains could arise from reduced effective dataset size or training schedule rather than the claimed biological grounding.

minor comments (1)

The acronym POYO-CAP should be expanded on first use and its relation to any prior POYO framework clarified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We agree that strengthening the abstract and providing explicit validation for the neuron partitioning criterion will improve the manuscript. We address each major comment below and commit to revisions that directly respond to the concerns while preserving the core contributions of POYO-CAP.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 12-13% relative improvement and stable scaling rests on high-level empirical assertions that lack error bars, statistical significance tests, cross-validation details, or ablation results comparing the skewness/kurtosis partition against random or alternative selection criteria.

Authors: We acknowledge that the abstract, as a concise summary, does not contain error bars, p-values, or explicit ablation comparisons. The main text reports results averaged over multiple seeds with standard deviations shown in Figures 3–5 and describes 5-fold cross-validation in Section 4.2. However, direct ablations against random partitioning and alternative criteria (e.g., variance-based selection) are only partially present in the supplement. To address the referee’s concern, we will revise the abstract to include a brief qualifier on statistical robustness and expand the main results section with a dedicated ablation table comparing skewness/kurtosis selection to random and other baselines, including statistical significance tests. revision: yes
Referee: [Abstract] Abstract: no evidence is supplied that the skewness/kurtosis-selected neurons exhibit lower trial-to-trial variability, higher stimulus mutual information, or lower masked-reconstruction loss than the complementary stochastic group; without such validation the reported gains could arise from reduced effective dataset size or training schedule rather than the claimed biological grounding.

Authors: This observation is correct: the current manuscript demonstrates downstream performance gains and scaling behavior but does not directly compare trial-to-trial variability, stimulus mutual information, or masked-reconstruction loss between the skewness/kurtosis-selected regular neurons and the stochastic complement. Consequently, alternative explanations such as effective dataset size or curriculum effects cannot be fully ruled out from the presented evidence. In the revision we will add a new analysis subsection (with a supporting figure) that computes and reports these metrics for both groups on the Allen Brain Observatory data, together with a size-matched control experiment that subsamples the stochastic population to equal the regular subset size. This will provide the requested validation or, if the differences are smaller than expected, allow us to qualify the biological interpretation accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central contribution is an empirical curriculum (POYO-CAP) that selects neurons via the independent statistical measures of skewness and kurtosis for initial pretraining, then fine-tunes on the full population, with performance gains measured against from-scratch baselines on the external Allen Brain Observatory dataset. No equations, definitions, or self-citations reduce the reported 12-13% improvements or scaling behavior to fitted parameters or inputs defined within the method itself. The partitioning criterion and auxiliary supervision are chosen independently of the final decoding metric, and the result remains falsifiable on held-out data without tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that statistical moments can separate regular from stochastic neurons and that this separation yields a beneficial training order; no free parameters or new entities are explicitly introduced in the abstract.

free parameters (1)

skewness and kurtosis selection thresholds
Used to identify statistically regular neurons; concrete values are required for the curriculum but not stated in the abstract.

axioms (1)

domain assumption Heterogeneity in neural calcium responses can be captured by skewness and kurtosis to distinguish statistically regular from stochastic neurons
This classification underpins the entire pretraining curriculum and data-selection step.

pith-pipeline@v0.9.0 · 5715 in / 1374 out tokens · 58075 ms · 2026-05-18T05:19:08.302978+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce POYO-CAP ... first trains with masked reconstruction plus lightweight auxiliary supervision on statistically regular neurons—identified via skewness and kurtosis
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

skewness≤3.51, kurtosis≤22.62 ... predictable subset comprising four CRE lines: SST, VIP, PV ALB, and NTSR1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.