An extremely coarse feedback signal is sufficient for learning human-aligned visual representations

Michael F. Bonner; Yash Mehta

arxiv: 2605.05556 · v1 · submitted 2026-05-07 · 💻 cs.CV

An extremely coarse feedback signal is sufficient for learning human-aligned visual representations

Yash Mehta , Michael F. Bonner This is my paper

Pith reviewed 2026-05-08 15:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords networksrepresentationshumanmodelssignaltrainedvisualalignment

0 comments

The pith

Neural networks trained on extremely coarse 8-category classification tasks learn visual representations that align with human and primate vision at least as well as fine-grained or self-supervised models and better match human perceptual similarity judgments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Artificial neural networks trained on visual tasks often develop internal patterns that resemble how the primate brain processes images. For years researchers have used increasingly detailed training signals, such as naming 1000 specific object classes or using self-supervised methods that treat every image as unique. This study asks whether that level of detail is actually needed. The authors created training tasks with different levels of coarseness by splitting a set of images into groups of 2, 4, 8, 16, up to 64 categories. They did this by applying principal component analysis to embeddings from an existing pretrained model rather than using human labels. Hundreds of networks, including both convolutional and transformer architectures, were trained on these tasks. The learned representations inside the networks were then compared to real brain data: electrical recordings from macaque visual cortex and functional MRI responses from human participants. The central result is that networks trained with only 8 broad categories produced representations that matched the brain data as well as or better than the fine-grained models. These coarse models also came closest to predicting which images humans judge as similar or different. The work suggests that very broad feedback signals can be enough to produce human-like visual representations in artificial systems.

Core claim

networks trained to distinguish as few as 8 broad categories learn representations that match or exceed the neural alignment of models distinguishing 1,000-classes. Even more strikingly, these coarsely trained networks align more closely with human perceptual similarity judgments than all other models evaluated, including networks trained with fine-grained supervision or self-supervision as well as leading large-scale vision models.

Load-bearing premise

The PCA-based partitioning of pretrained embeddings produces semantically meaningful broad categories that do not inherit biases or representational structure from the original pretrained model used to create the splits.

read the original abstract

Artificial neural networks trained on visual tasks develop internal representations resembling those of the primate visual system, a discovery that has guided a decade of computational neuroscience. Research on building brain-aligned models has progressively embraced finer-grained supervisory signals, from object classification to contrastive self-supervised objectives that maximize distinctions among individual images, yet the role of supervisory signal granularity on brain alignment remains largely unexamined. Here we systematically investigate how the coarseness of a learning signal shapes representational alignment with human vision. We parametrically vary the level of signal granularity using a data-driven approach that partitions a set of training images into varied numbers of categories (2, 4, 8, 16, ..., 64) via PCA-based splits of pretrained embeddings. We train hundreds of neural networks across convolutional and transformer architectures on these coarse classification tasks and compare their representations to macaque electrophysiology recordings and human fMRI responses. We find that networks trained to distinguish as few as 8 broad categories learn representations that match or exceed the neural alignment of models distinguishing 1,000-classes. Even more strikingly, these coarsely trained networks align more closely with human perceptual similarity judgments than all other models evaluated, including networks trained with fine-grained supervision or self-supervision as well as leading large-scale vision models. These results demonstrate that human-like visual representations emerge from remarkably coarse feedback, reframing what learning signals vision may require and opening a path toward building AI systems that are more aligned with human perception.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Coarse 8-category training from PCA splits matches or beats finer supervision for brain alignment, but the category construction likely inherits fine structure so it does not cleanly show coarseness alone suffices.

read the letter

The main thing to know is that networks trained to distinguish just 8 broad categories end up with representations that align as well as or better than 1000-class models on macaque electrophysiology and human fMRI, and they even outperform fine-grained and self-supervised models on human perceptual similarity judgments. The paper supports this with a parametric sweep over category counts from 2 to 64 and hundreds of trained networks across architectures.

Referee Report

2 major / 2 minor

Summary. The paper claims that neural networks trained on extremely coarse visual classification tasks—as few as 8 categories, obtained via PCA-based partitioning of pretrained embeddings—develop internal representations that align as well as or better with macaque electrophysiology and human fMRI data than standard 1000-class supervised models, self-supervised models, or large-scale vision models. It further reports that these coarsely supervised networks match human perceptual similarity judgments more closely than any other evaluated models, concluding that human-aligned visual representations can emerge from remarkably coarse supervisory signals.

Significance. If the central empirical result holds after addressing potential confounds in category construction, the finding would be significant for computational neuroscience and vision modeling: it would indicate that fine-grained labels or contrastive objectives are not necessary for brain-like representations, potentially simplifying training pipelines for aligned AI systems while challenging prevailing assumptions about the granularity required for primate-like visual cortex modeling. The scale of the experiments (hundreds of networks across CNN and transformer architectures, direct comparisons to electrophysiological and fMRI recordings) provides a broad empirical basis that strengthens the result if controls are adequate.

major comments (2)

[Methods (PCA-based splits)] Methods section on PCA-based category generation: the 8-way (and other coarse) partitions are derived from PCA on embeddings of a pretrained network (presumably ImageNet-trained). This procedure risks inheriting fine-grained semantic structure into the 'coarse' labels, so the observed alignment advantage may reflect the representational geometry of the original high-capacity model rather than the sufficiency of low-granularity supervision per se. The central claim that 'an extremely coarse feedback signal is sufficient' is load-bearing on this point; without a control using randomly assigned or semantically independent 8-way labels, the result cannot isolate granularity from inherited structure.
[Results (human perceptual similarity)] Results on human perceptual similarity judgments: the claim that coarsely trained networks outperform all other models (including self-supervised and large-scale ones) in matching human judgments requires explicit reporting of effect sizes, confidence intervals, and correction for the large number of architecture-granularity comparisons performed. If the advantage is driven primarily by the particular PCA-derived partitions, the headline result on human alignment would need qualification.

minor comments (2)

[Abstract / Introduction] The abstract and introduction should clarify which specific pretrained model supplies the embeddings for the PCA splits and whether any ablation was performed with alternative embedding sources.
[Methods / Figures] Figure legends and methods should explicitly state the number of random seeds, training epochs, and hyperparameter search procedure used for the hundreds of networks to allow reproducibility assessment.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that PCA splits of pretrained embeddings yield unbiased broad categories and that the chosen neural alignment metrics (electrophysiology and fMRI) are appropriate proxies for human vision.

free parameters (1)

number of categories
Parametrically varied from 2 to 64 to test effects of granularity

axioms (1)

domain assumption PCA-based splits of pretrained embeddings produce meaningful broad semantic categories
Used to generate the coarse classification tasks in a data-driven manner

pith-pipeline@v0.9.0 · 5557 in / 1334 out tokens · 59985 ms · 2026-05-08T15:08:14.678075+00:00 · methodology

An extremely coarse feedback signal is sufficient for learning human-aligned visual representations

Core claim

Load-bearing premise

discussion (0)