arxiv: 2405.07987 · v5 · submitted 2024-05-13 · 💻 cs.LG · cs.AI· cs.CV· cs.NE

Recognition: 2 theorem links

· Lean Theorem

The Platonic Representation Hypothesis

Minyoung Huh , Brian Cheung , Tongzhou Wang , Phillip Isola

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.NE

keywords representation convergenceplatonic representationneural network representationsmultimodal alignmentvision language modelsstatistical model of realityscaling lawsrepresentation similarity

0 comments

The pith

Representations learned by different neural networks are converging toward a shared statistical model of reality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper collects examples showing that neural networks trained on different tasks, architectures, and data are producing more aligned internal representations as time passes and models grow larger. In particular, vision models and language models are shown to measure distances between data points in increasingly similar ways once they reach sufficient scale. The authors hypothesize that this trend reflects movement toward a single underlying statistical description of the world, which they name the platonic representation. If the hypothesis holds, many observed compatibilities between seemingly unrelated models would be explained by their shared approach to this common structure rather than by coincidence or identical training conditions.

Core claim

We argue that representations in AI models are converging. Evidence includes growing alignment across time, domains, and modalities, with vision and language networks measuring distances between points more alike as they scale. We hypothesize that this convergence is driven toward a shared statistical model of reality, which we term the platonic representation, and we outline possible selective pressures favoring it along with implications and counterexamples.

What carries the argument

The platonic representation: the hypothesized common statistical model of reality that separate neural networks approach as they scale.

If this is right

Models trained independently will exhibit greater interoperability and zero-shot transfer as they grow larger.
Distance-based tasks such as retrieval or clustering will produce more consistent results across modalities.
The space of possible learned representations narrows with scale, limiting the diversity of internal world models.
Counterexamples will become rarer but will still exist for models trained on narrow or non-natural data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the trend continues, new modalities such as audio or robotics may show the same cross-modal alignment once models reach comparable scale.
The hypothesis suggests that apparent differences between model families are transient and will diminish with additional compute and data.
A practical test would be to measure whether the top principal components of scaled vision and language embeddings become progressively more linearly mappable to each other.

Load-bearing premise

That increasing similarity between representation spaces indicates convergence to an objective underlying model rather than shared inductive biases, overlapping training data, or architectural commonalities.

What would settle it

A demonstration that further scaling of vision and language models causes their pairwise distance measurements to diverge or stabilize at different alignments instead of continuing to match more closely.

read the original abstract

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Representations from scaled models are aligning on distance metrics across modalities, but the Platonic convergence claim lacks controls that would separate it from shared data or bias explanations.

read the letter

The paper's main contribution is a survey of cases where neural representations have grown more similar over time and across domains, plus a new set of measurements showing that larger vision and language models agree more closely on pairwise distances between datapoints. That cross-modal alignment trend is the clearest new piece of evidence they add, and the plots make the pattern easy to see. The survey itself is thorough and pulls together scattered prior results on representation similarity, which saves readers time tracking the pattern. The Platonic framing—that this alignment reflects convergence on an objective statistical model of reality—is presented as a hypothesis rather than a proven result. The authors discuss possible selective pressures, but the argument does not include ablations that hold training data or objectives fixed while varying only the target model of reality. Without those checks, shared corpora or architectural priors remain plausible alternative drivers for the observed alignment. The selective-pressures section also risks circularity by invoking the same convergence it sets out to explain. This is a useful read for anyone working on representation learning, scaling trends, or cross-modal transfer. It is not a new empirical result or derivation, so it will not change what most labs cite in their methods, but the compiled examples and distance measurements are worth having in one place. A serious editor should send it to peer review as a perspective piece, with the expectation that referees will ask for tighter tests of the central interpretation.

Referee Report

3 major / 2 minor

Summary. The paper surveys examples of converging representations in neural networks across time, domains, and modalities; presents new measurements showing that larger vision and language models increasingly align in how they measure distances between datapoints; hypothesizes that this reflects convergence toward a shared statistical model of reality termed the 'platonic representation'; discusses possible selective pressures; and outlines implications, limitations, and counterexamples.

Significance. If supported by stronger controls and falsifiable tests, the hypothesis could provide a unifying lens for representation learning, with implications for scaling laws, cross-modal transfer, and the emergence of shared world models in AI. The work compiles existing literature effectively but adds only limited new measurements without distinguishing the claimed mechanism from alternatives.

major comments (3)

[modality-convergence demonstration] The cross-modal distance alignment demonstration lacks ablations that hold training data, objectives, or architectures fixed while varying only the underlying 'reality model'; without these, the measurements cannot rule out shared inductive biases or corpus overlap as the primary driver (see the modality-convergence section and associated figures).
[selective pressures section] The selective-pressures discussion invokes the platonic-representation hypothesis to explain the same alignment observations used to motivate it, creating a circularity that weakens the causal claim; additional independent evidence or a formal model of the pressures is needed (see the selective pressures section).
[implications and limitations sections] No quantitative bounds, falsifiable predictions, or controlled experiments are provided that would distinguish convergence to an objective statistical model from simpler alternatives such as scaling-induced similarity; this leaves the central hypothesis underdetermined (see the implications and limitations sections).

minor comments (2)

[Abstract] The abstract introduces the 'platonic representation' without a concise operational definition or mathematical characterization that could be referenced later.
[figures] Figure captions and axis labels in the distance-alignment plots could more explicitly state the distance metric and normalization used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights opportunities to strengthen the empirical claims and clarify the logical structure of the hypothesis. We address each major comment below, proposing targeted revisions to the manuscript where appropriate while defending the core contributions on substantive grounds.

read point-by-point responses

Referee: The cross-modal distance alignment demonstration lacks ablations that hold training data, objectives, or architectures fixed while varying only the underlying 'reality model'; without these, the measurements cannot rule out shared inductive biases or corpus overlap as the primary driver (see the modality-convergence section and associated figures).

Authors: We agree that controlled ablations isolating the 'reality model' from inductive biases and data overlap would provide stronger causal evidence. Such experiments are computationally intensive for frontier-scale models and were not feasible within the scope of this work. However, the observed alignment trend holds across independently developed models (e.g., different vision transformers and language models trained by separate groups on partially overlapping but non-identical corpora), and we cite supporting literature on convergence under varied objectives. In revision, we will expand the modality-convergence section to explicitly discuss these potential confounds, include additional controls where smaller-scale proxies are available, and qualify the interpretation accordingly. revision: partial
Referee: The selective-pressures discussion invokes the platonic-representation hypothesis to explain the same alignment observations used to motivate it, creating a circularity that weakens the causal claim; additional independent evidence or a formal model of the pressures is needed (see the selective pressures section).

Authors: The selective pressures section is framed as a hypothesis-generating discussion rather than a causal proof. The alignment observations motivate the platonic representation as a unifying description; the pressures (e.g., optimization for generalization, data efficiency, and cross-task transfer) are proposed mechanisms drawn from independent scaling-law results and representation-learning theory. To reduce any perceived circularity, we will revise the section to separate the descriptive hypothesis from the mechanistic discussion, add citations to prior work on inductive biases that predate our measurements, and clarify that the pressures are testable via future interventions such as controlled training regimes. revision: partial
Referee: No quantitative bounds, falsifiable predictions, or controlled experiments are provided that would distinguish convergence to an objective statistical model from simpler alternatives such as scaling-induced similarity; this leaves the central hypothesis underdetermined (see the implications and limitations sections).

Authors: We acknowledge that the current version presents the hypothesis primarily through qualitative trends and literature synthesis rather than quantitative bounds or explicit falsification tests. In the revised manuscript, we will add a dedicated subsection in Implications that articulates concrete, measurable predictions (e.g., expected growth in cross-modal distance correlation as a function of parameter count, and thresholds beyond which scaling alone cannot explain residual alignment). We will also expand Limitations to contrast the platonic account against pure scaling-induced similarity and outline feasible controlled experiments using matched smaller models. revision: yes

Circularity Check

0 steps flagged

No significant circularity; hypothesis framed around external observations

full rationale

The paper surveys existing literature on representation convergence across models and domains, then reports new cross-modal measurements showing increasing alignment in distance structures between vision and language models as scale increases. The platonic representation is introduced explicitly as a hypothesis to interpret these trends, followed by discussion of possible selective pressures and implications. No derivation step reduces by construction to its own inputs: there are no fitted parameters renamed as predictions, no self-definitional loops where the target quantity is presupposed in the premise, and no load-bearing uniqueness theorems imported solely via self-citation. The central claim remains an interpretive hypothesis whose support is drawn from cited external results and the reported measurements rather than tautological re-expression of the same data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that representation similarity metrics capture meaningful convergence to an objective reality model. No free parameters are fitted. The platonic representation is introduced as a new conceptual entity without independent falsifiable evidence.

axioms (1)

domain assumption Representations learned by neural networks can be compared across models and modalities using distance metrics that reflect semantic similarity.
Invoked when claiming convergence from alignment of distance measures.

invented entities (1)

platonic representation no independent evidence
purpose: A shared statistical model of reality that different neural networks converge toward.
Postulated to explain observed alignment; no independent evidence or falsifiable prediction is provided beyond the alignment itself.

pith-pipeline@v0.9.0 · 5419 in / 1145 out tokens · 41970 ms · 2026-05-15T05:58:39.603476+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

The Multitask Scaling Hypothesis: There are fewer representations that are competent for N tasks than there are for M < N tasks. As we train more general models that solve more tasks at once, we should expect fewer possible solutions.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning
cs.LG 2026-05 unverdicted novelty 8.0

Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
cs.CR 2026-05 unverdicted novelty 7.0

LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.
Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models
cs.CV 2026-05 unverdicted novelty 7.0

DCA measures intra-sample representational consistency in frozen vision models by checking per-dimension coactivation across regions, achieving 0.91-0.93 AUC in deepfake detection with DINOv3 features.
Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance
cs.CV 2026-05 unverdicted novelty 7.0

Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and veri...
Geometry-Aware CLIP Retrieval via Local Cross-Modal Alignment and Steering
cs.CV 2026-04 unverdicted novelty 7.0

Neighborhood re-ranking via Hungarian matching and query-conditioned local steering improve CLIP retrieval on attribute-binding and compositional tasks by addressing local geometric inconsistencies.
Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 6.0

Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
cs.CV 2026-05 unverdicted novelty 6.0

Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer
cs.LG 2026-05 unverdicted novelty 6.0

Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.
Compared to What? Baselines and Metrics for Counterfactual Prompting
cs.CL 2026-05 conditional novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistica...
Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities
cs.LG 2026-05 unverdicted novelty 6.0

Frozen text-pretrained transformer weights transfer across modalities through a thin interface, achieving SOTA on a robotic task and parity on decision-making with far fewer trainable parameters.
Human Cognition in Machines: A Unified Perspective of World Models
cs.RO 2026-04 unverdicted novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
cs.CL 2026-04 unverdicted novelty 6.0

Centroid erasure shows language representations overshadow vision in multimodal models, and text-centroid contrastive decoding recovers substantial accuracy on visual reasoning tasks.
Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
cs.LG 2026-04 unverdicted novelty 6.0

A feedforward graph of heterogeneous frozen LLMs linked by linear projections in a shared latent space outperforms single models on ARC-Challenge, OpenBookQA, and MMLU using just 17.6M trainable parameters.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
cs.LG 2026-04 unverdicted novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...
Rethinking the Good Enough Embedding for Easy Few-Shot Learning
cs.CV 2026-05 conditional novelty 5.0

Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.
Control Charts for Multi-agent Systems
cs.MA 2026-05 unverdicted novelty 5.0

Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against ...
ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across Language Models and Neural Perturbation Data
cs.LG 2026-04 unverdicted novelty 5.0

ATLAS shows constitutions induce recoverable latent geometry in LLMs that redistributes but remains detectable across models and neural perturbation data via source-defined families and AUC separations.
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
cs.CV 2026-04 unverdicted novelty 5.0

Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
Positive Alignment: Artificial Intelligence for Human Flourishing
cs.AI 2026-05 unverdicted novelty 4.0

Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.
Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers
cs.CV 2026-05 unverdicted novelty 4.0

Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.
Measuring AI Reasoning: A Guide for Researchers
cs.AI 2026-05 unverdicted novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
Toward Aristotelian Medical Representations: Backpropagation-Free Layer-wise Analysis for Interpretable Generalized Metric Learning on MedMNIST
cs.CV 2026-04 unverdicted novelty 4.0

A-ROM delivers competitive MedMNIST performance via pretrained ViT metric spaces, a concept dictionary, and kNN without backpropagation or fine-tuning, framed as interpretable few-shot learning under the Platonic Repr...

Reference graph

Works this paper leans on

272 extracted references · 272 canonical work pages · cited by 21 Pith papers · 35 internal anchors

[1]

Cognitive Systems Research , volume =

Explanatory models in neuroscience: Part 2--constraint-based intelligibility , author=. Cognitive Systems Research , volume =

work page
[2]

Communications of the ACM , volume=

Imagenet classification with deep convolutional neural networks , author=. Communications of the ACM , volume=. 2017 , publisher=

work page 2017
[3]

Nature communications , volume=

Input--output maps are strongly biased towards simple outputs , author=. Nature communications , volume=. 2018 , publisher=

work page 2018
[4]

arXiv preprint arXiv:2201.10005 , year=

Text and code embeddings by contrastive pre-training , author=. arXiv preprint arXiv:2201.10005 , year=

work page arXiv
[5]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Efficient indexing of billion-scale datasets of deep descriptors , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

work page
[6]

Efficient Estimation of Word Representations in Vector Space

Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Lit: Zero-shot transfer with locked-image text tuning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[10]

International conference on machine learning , pages=

Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[11]

Advances in neural information processing systems , volume=

Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability , author=. Advances in neural information processing systems , volume=

work page
[12]

steerability

On the" steerability" of generative adversarial networks , author=. arXiv preprint arXiv:1907.07171 , year=

work page arXiv 1907
[13]

arXiv preprint arXiv:2211.01201 , year=

Human alignment of neural network representations , author=. arXiv preprint arXiv:2211.01201 , year=

work page arXiv
[14]

International journal of computer vision , volume=

Imagenet large scale visual recognition challenge , author=. International journal of computer vision , volume=. 2015 , publisher=

work page 2015
[15]

Dialectica , volume=

The truth in realism , author=. Dialectica , volume=. 1989 , publisher=

work page 1989
[16]

International conference on algorithmic learning theory , pages=

Measuring statistical dependence with Hilbert-Schmidt norms , author=. International conference on algorithmic learning theory , pages=. 2005 , organization=

work page 2005
[17]

Proceedings of the National Academy of Sciences , volume=

The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

work page 2021
[18]

Proceedings of the national academy of sciences , volume=

Performance-optimized hierarchical models predict neural responses in higher visual cortex , author=. Proceedings of the national academy of sciences , volume=. 2014 , publisher=

work page 2014
[19]

BERTScore: Evaluating Text Generation with BERT

Bertscore: Evaluating text generation with bert , author=. arXiv preprint arXiv:1904.09675 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[20]

arXiv preprint arXiv:2110.04374 , year=

A few more examples may be worth billions of parameters , author=. arXiv preprint arXiv:2110.04374 , year=

work page arXiv
[21]

arXiv preprint arXiv:2305.12827 , year=

Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models , author=. arXiv preprint arXiv:2305.12827 , year=

work page arXiv
[22]

Science , volume=

Toward a universal law of generalization for psychological science , author=. Science , volume=. 1987 , publisher=

work page 1987
[23]

Part I , author=

A formal theory of inductive inference. Part I , author=. Information and control , volume=. 1964 , publisher=

work page 1964
[24]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Climbing towards NLU: On meaning, form, and understanding in the age of data , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

work page
[26]

2024 , booktitle=

Quantifying Representation Reliability in Self-Supervised Learning Models , author=. 2024 , booktitle=

work page 2024
[27]

NeurIPS , year =

Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae , title =. NeurIPS , year =

work page
[28]

International Conference on Machine Learning , pages=

Contrastive learning inverts the data generating process , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[29]

Journal of biomedical informatics , volume=

Language models are an effective representation learning technique for electronic health record data , author=. Journal of biomedical informatics , volume=. 2021 , publisher=

work page 2021
[30]

Transactions of the American mathematical society , volume=

Theory of reproducing kernels , author=. Transactions of the American mathematical society , volume=

work page
[31]

1998 , publisher=

Learning with kernels , author=. 1998 , publisher=

work page 1998
[34]

Advances in neural information processing systems , volume=

Learning curves: Asymptotic values and rate of convergence , author=. Advances in neural information processing systems , volume=

work page
[35]

Advances in Neural Information Processing Systems , volume=

Procedural image programs for representation learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[36]

Advances in Neural Information Processing Systems , year=

Learning to See by Looking at Noise , author=. Advances in Neural Information Processing Systems , year=

work page
[37]

The visual task adaptation benchmark , author=

work page
[39]

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal=

work page
[40]

arXiv preprint arXiv:2102.06701 , year=

Explaining neural scaling laws , author=. arXiv preprint arXiv:2102.06701 , year=

work page arXiv
[43]

Gemma: Open Models Based on Gemini Research and Technology

Gemma: Open models based on gemini research and technology , author=. arXiv preprint arXiv:2403.08295 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[45]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[46]

Geng, Xinyang and Liu, Hao , title =

work page
[47]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[48]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

work page 1948
[49]

International conference on machine learning , pages=

Essentially no barriers in neural network energy landscape , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[50]

Advances in Neural Information Processing Systems , volume=

Uniform convergence may be unable to explain generalization in deep learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[51]

Advances in neural information processing systems , volume=

Loss surfaces, mode connectivity, and fast ensembling of dnns , author=. Advances in neural information processing systems , volume=

work page
[52]

Topology and Geometry of Half-Rectified Network Optimization

Topology and geometry of half-rectified network optimization , author=. arXiv preprint arXiv:1611.01540 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[53]

International Conference on Machine Learning , pages=

Linear mode connectivity and the lottery ticket hypothesis , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[54]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Rosetta neurons: Mining the common units in a model zoo , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[55]

Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY , year=

Learning how the world works: Specifications for predictive networks in robots and brains , author=. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY , year=

work page
[58]

Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor , journal=

work page
[59]

arXiv preprint arXiv:2311.17076 , year=

Compositional chain-of-thought prompting for large multimodal models , author=. arXiv preprint arXiv:2311.17076 , year=

work page arXiv
[61]

1995 , publisher=

The Quark and the Jaguar: Adventures in the Simple and the Complex , author=. 1995 , publisher=

work page 1995
[62]

Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=

Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , author=. Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=. 2010 , organization=

work page 2010
[63]

Proceedings of the National Academy of Sciences , volume=

Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

work page 2020
[64]

IEEE transactions on pattern analysis and machine intelligence , volume=

Best-buddies similarity—Robust template matching using mutual nearest neighbors , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2017 , publisher=

work page 2017
[65]

1897 , publisher=

Lectures on the `Republic' of Plato , author=. 1897 , publisher=

work page
[66]

Journal of Machine Learning Research , volume=

Optimal convergence rates for convex distributed optimization in networks , author=. Journal of Machine Learning Research , volume=

work page
[67]

The Philosophical Quarterly (1950-) , volume=

Three kinds of scientific realism , author=. The Philosophical Quarterly (1950-) , volume=. 1982 , publisher=

work page 1950
[68]

Philosophy of Science , volume=

Reconstructing scientific realism to rebut the pessimistic meta-induction , author=. Philosophy of Science , volume=. 2007 , publisher=

work page 2007
[69]

Philosophy of Science , volume=

In defense of convergent realism , author=. Philosophy of Science , volume=. 1982 , publisher=

work page 1982
[70]

1981 , publisher=

The Rationality of Science , author=. 1981 , publisher=

work page 1981
[71]

International Conference on Machine Learning , pages=

Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[72]

Neural networks , volume=

Local minima and plateaus in hierarchical structures of multilayer perceptrons , author=. Neural networks , volume=. 2000 , publisher=

work page 2000
[73]

Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape , author=. arXiv preprint arXiv:1907.02911 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[74]

Advances in Neural Information Processing Systems , volume=

Optimizing mode connectivity via neuron alignment , author=. Advances in Neural Information Processing Systems , volume=

work page
[77]

Nature Communications , volume=

Segment anything in medical images , author=. Nature Communications , volume=. 2024 , publisher=

work page 2024
[78]

arXiv preprint arXiv:2110.06296 , year=

The role of permutation invariance in linear mode connectivity of neural networks , author=. arXiv preprint arXiv:2110.06296 , year=

work page arXiv
[79]

Advances in Neural Information Processing Systems , volume=

Large scale structure of neural network loss landscapes , author=. Advances in Neural Information Processing Systems , volume=

work page
[80]

1985 , institution=

Learning internal representations by error propagation , author=. 1985 , institution=

work page 1985
[81]

Advances in Neural Information Processing Systems , volume=

Bad global minima exist and sgd can reach them , author=. Advances in Neural Information Processing Systems , volume=

work page
[82]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Understanding image representations by measuring their equivariance and equivalence , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[84]

Advances in neural information processing systems , volume=

Revisiting model stitching to compare neural representations , author=. Advances in neural information processing systems , volume=

work page
[85]

arXiv preprint arXiv:2305.11206 , year=

LIMA: Less Is More for Alignment , author=. arXiv preprint arXiv:2305.11206 , year=

work page arXiv
[86]

International Conference on Learning Representations , year=

Effect of scale on catastrophic forgetting in neural networks , author=. International Conference on Learning Representations , year=

work page
[87]

arXiv preprint arXiv:2112.09153 , year=

An empirical investigation of the role of pre-training in lifelong learning , author=. arXiv preprint arXiv:2112.09153 , year=

work page arXiv
[88]

Computer Science

Improving image generation with better captions , author=. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf , volume=

work page
[89]

Self-correcting

Wu, Tsung-Han and Lian, Long and Gonzalez, Joseph E and Li, Boyi and Darrell, Trevor , journal=. Self-correcting

work page
[90]

Lian, Long and Shi, Baifeng and Yala, Adam and Darrell, Trevor and Li, Boyi , journal=

work page
[91]

International Conference on Machine Learning , pages=

Wide neural networks forget less catastrophically , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[92]

LoRA: Low-Rank Adaptation of Large Language Models

Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[93]

International Conference on Machine Learning , pages=

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[94]

arXiv preprint arXiv:2205.12411 , year=

Linear connectivity reveals generalization strategies , author=. arXiv preprint arXiv:2205.12411 , year=

work page arXiv
[95]

Machine super intelligence , author=

work page

Showing first 80 references.