Recognition: 2 theorem links
· Lean TheoremThe Platonic Representation Hypothesis
Pith reviewed 2026-05-15 05:58 UTC · model grok-4.3
The pith
Representations learned by different neural networks are converging toward a shared statistical model of reality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We argue that representations in AI models are converging. Evidence includes growing alignment across time, domains, and modalities, with vision and language networks measuring distances between points more alike as they scale. We hypothesize that this convergence is driven toward a shared statistical model of reality, which we term the platonic representation, and we outline possible selective pressures favoring it along with implications and counterexamples.
What carries the argument
The platonic representation: the hypothesized common statistical model of reality that separate neural networks approach as they scale.
If this is right
- Models trained independently will exhibit greater interoperability and zero-shot transfer as they grow larger.
- Distance-based tasks such as retrieval or clustering will produce more consistent results across modalities.
- The space of possible learned representations narrows with scale, limiting the diversity of internal world models.
- Counterexamples will become rarer but will still exist for models trained on narrow or non-natural data.
Where Pith is reading between the lines
- If the trend continues, new modalities such as audio or robotics may show the same cross-modal alignment once models reach comparable scale.
- The hypothesis suggests that apparent differences between model families are transient and will diminish with additional compute and data.
- A practical test would be to measure whether the top principal components of scaled vision and language embeddings become progressively more linearly mappable to each other.
Load-bearing premise
That increasing similarity between representation spaces indicates convergence to an objective underlying model rather than shared inductive biases, overlapping training data, or architectural commonalities.
What would settle it
A demonstration that further scaling of vision and language models causes their pairwise distance measurements to diverge or stabilize at different alignments instead of continuing to match more closely.
read the original abstract
We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys examples of converging representations in neural networks across time, domains, and modalities; presents new measurements showing that larger vision and language models increasingly align in how they measure distances between datapoints; hypothesizes that this reflects convergence toward a shared statistical model of reality termed the 'platonic representation'; discusses possible selective pressures; and outlines implications, limitations, and counterexamples.
Significance. If supported by stronger controls and falsifiable tests, the hypothesis could provide a unifying lens for representation learning, with implications for scaling laws, cross-modal transfer, and the emergence of shared world models in AI. The work compiles existing literature effectively but adds only limited new measurements without distinguishing the claimed mechanism from alternatives.
major comments (3)
- [modality-convergence demonstration] The cross-modal distance alignment demonstration lacks ablations that hold training data, objectives, or architectures fixed while varying only the underlying 'reality model'; without these, the measurements cannot rule out shared inductive biases or corpus overlap as the primary driver (see the modality-convergence section and associated figures).
- [selective pressures section] The selective-pressures discussion invokes the platonic-representation hypothesis to explain the same alignment observations used to motivate it, creating a circularity that weakens the causal claim; additional independent evidence or a formal model of the pressures is needed (see the selective pressures section).
- [implications and limitations sections] No quantitative bounds, falsifiable predictions, or controlled experiments are provided that would distinguish convergence to an objective statistical model from simpler alternatives such as scaling-induced similarity; this leaves the central hypothesis underdetermined (see the implications and limitations sections).
minor comments (2)
- [Abstract] The abstract introduces the 'platonic representation' without a concise operational definition or mathematical characterization that could be referenced later.
- [figures] Figure captions and axis labels in the distance-alignment plots could more explicitly state the distance metric and normalization used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights opportunities to strengthen the empirical claims and clarify the logical structure of the hypothesis. We address each major comment below, proposing targeted revisions to the manuscript where appropriate while defending the core contributions on substantive grounds.
read point-by-point responses
-
Referee: The cross-modal distance alignment demonstration lacks ablations that hold training data, objectives, or architectures fixed while varying only the underlying 'reality model'; without these, the measurements cannot rule out shared inductive biases or corpus overlap as the primary driver (see the modality-convergence section and associated figures).
Authors: We agree that controlled ablations isolating the 'reality model' from inductive biases and data overlap would provide stronger causal evidence. Such experiments are computationally intensive for frontier-scale models and were not feasible within the scope of this work. However, the observed alignment trend holds across independently developed models (e.g., different vision transformers and language models trained by separate groups on partially overlapping but non-identical corpora), and we cite supporting literature on convergence under varied objectives. In revision, we will expand the modality-convergence section to explicitly discuss these potential confounds, include additional controls where smaller-scale proxies are available, and qualify the interpretation accordingly. revision: partial
-
Referee: The selective-pressures discussion invokes the platonic-representation hypothesis to explain the same alignment observations used to motivate it, creating a circularity that weakens the causal claim; additional independent evidence or a formal model of the pressures is needed (see the selective pressures section).
Authors: The selective pressures section is framed as a hypothesis-generating discussion rather than a causal proof. The alignment observations motivate the platonic representation as a unifying description; the pressures (e.g., optimization for generalization, data efficiency, and cross-task transfer) are proposed mechanisms drawn from independent scaling-law results and representation-learning theory. To reduce any perceived circularity, we will revise the section to separate the descriptive hypothesis from the mechanistic discussion, add citations to prior work on inductive biases that predate our measurements, and clarify that the pressures are testable via future interventions such as controlled training regimes. revision: partial
-
Referee: No quantitative bounds, falsifiable predictions, or controlled experiments are provided that would distinguish convergence to an objective statistical model from simpler alternatives such as scaling-induced similarity; this leaves the central hypothesis underdetermined (see the implications and limitations sections).
Authors: We acknowledge that the current version presents the hypothesis primarily through qualitative trends and literature synthesis rather than quantitative bounds or explicit falsification tests. In the revised manuscript, we will add a dedicated subsection in Implications that articulates concrete, measurable predictions (e.g., expected growth in cross-modal distance correlation as a function of parameter count, and thresholds beyond which scaling alone cannot explain residual alignment). We will also expand Limitations to contrast the platonic account against pure scaling-induced similarity and outline feasible controlled experiments using matched smaller models. revision: yes
Circularity Check
No significant circularity; hypothesis framed around external observations
full rationale
The paper surveys existing literature on representation convergence across models and domains, then reports new cross-modal measurements showing increasing alignment in distance structures between vision and language models as scale increases. The platonic representation is introduced explicitly as a hypothesis to interpret these trends, followed by discussion of possible selective pressures and implications. No derivation step reduces by construction to its own inputs: there are no fitted parameters renamed as predictions, no self-definitional loops where the target quantity is presupposed in the premise, and no load-bearing uniqueness theorems imported solely via self-citation. The central claim remains an interpretive hypothesis whose support is drawn from cited external results and the reported measurements rather than tautological re-expression of the same data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Representations learned by neural networks can be compared across models and modalities using distance metrics that reflect semantic similarity.
invented entities (1)
-
platonic representation
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
The Multitask Scaling Hypothesis: There are fewer representations that are competent for N tasks than there are for M < N tasks. As we train more general models that solve more tasks at once, we should expect fewer possible solutions.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 22 Pith papers
-
Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning
Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.
-
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.
-
Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models
DCA measures intra-sample representational consistency in frozen vision models by checking per-dimension coactivation across regions, achieving 0.91-0.93 AUC in deepfake detection with DINOv3 features.
-
Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance
Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and veri...
-
Geometry-Aware CLIP Retrieval via Local Cross-Modal Alignment and Steering
Neighborhood re-ranking via Hungarian matching and query-conditioned local steering improve CLIP retrieval on attribute-binding and compositional tasks by addressing local geometric inconsistencies.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
-
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
-
Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer
Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.
-
Compared to What? Baselines and Metrics for Counterfactual Prompting
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistica...
-
Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities
Frozen text-pretrained transformer weights transfer across modalities through a thin interface, achieving SOTA on a robotic task and parity on decision-making with far fewer trainable parameters.
-
Human Cognition in Machines: A Unified Perspective of World Models
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...
-
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
Centroid erasure shows language representations overshadow vision in multimodal models, and text-centroid contrastive decoding recovers substantial accuracy on visual reasoning tasks.
-
Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
A feedforward graph of heterogeneous frozen LLMs linked by linear projections in a shared latent space outperforms single models on ARC-Challenge, OpenBookQA, and MMLU using just 17.6M trainable parameters.
-
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...
-
Rethinking the Good Enough Embedding for Easy Few-Shot Learning
Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.
-
Control Charts for Multi-agent Systems
Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against ...
-
ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across Language Models and Neural Perturbation Data
ATLAS shows constitutions induce recoverable latent geometry in LLMs that redistributes but remains detectable across models and neural perturbation data via source-defined families and AUC separations.
-
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.
-
Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers
Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
-
Toward Aristotelian Medical Representations: Backpropagation-Free Layer-wise Analysis for Interpretable Generalized Metric Learning on MedMNIST
A-ROM delivers competitive MedMNIST performance via pretrained ViT metric spaces, a concept dictionary, and kNN without backpropagation or fine-tuning, framed as interpretable few-shot learning under the Platonic Repr...
Reference graph
Works this paper leans on
-
[1]
Cognitive Systems Research , volume =
Explanatory models in neuroscience: Part 2--constraint-based intelligibility , author=. Cognitive Systems Research , volume =
-
[2]
Communications of the ACM , volume=
Imagenet classification with deep convolutional neural networks , author=. Communications of the ACM , volume=. 2017 , publisher=
work page 2017
-
[3]
Nature communications , volume=
Input--output maps are strongly biased towards simple outputs , author=. Nature communications , volume=. 2018 , publisher=
work page 2018
-
[4]
arXiv preprint arXiv:2201.10005 , year=
Text and code embeddings by contrastive pre-training , author=. arXiv preprint arXiv:2201.10005 , year=
-
[5]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Efficient indexing of billion-scale datasets of deep descriptors , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[6]
Efficient Estimation of Word Representations in Vector Space
Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Lit: Zero-shot transfer with locked-image text tuning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[10]
International conference on machine learning , pages=
Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[11]
Advances in neural information processing systems , volume=
Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability , author=. Advances in neural information processing systems , volume=
-
[12]
On the" steerability" of generative adversarial networks , author=. arXiv preprint arXiv:1907.07171 , year=
-
[13]
arXiv preprint arXiv:2211.01201 , year=
Human alignment of neural network representations , author=. arXiv preprint arXiv:2211.01201 , year=
-
[14]
International journal of computer vision , volume=
Imagenet large scale visual recognition challenge , author=. International journal of computer vision , volume=. 2015 , publisher=
work page 2015
-
[15]
The truth in realism , author=. Dialectica , volume=. 1989 , publisher=
work page 1989
-
[16]
International conference on algorithmic learning theory , pages=
Measuring statistical dependence with Hilbert-Schmidt norms , author=. International conference on algorithmic learning theory , pages=. 2005 , organization=
work page 2005
-
[17]
Proceedings of the National Academy of Sciences , volume=
The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=
work page 2021
-
[18]
Proceedings of the national academy of sciences , volume=
Performance-optimized hierarchical models predict neural responses in higher visual cortex , author=. Proceedings of the national academy of sciences , volume=. 2014 , publisher=
work page 2014
-
[19]
BERTScore: Evaluating Text Generation with BERT
Bertscore: Evaluating text generation with bert , author=. arXiv preprint arXiv:1904.09675 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[20]
arXiv preprint arXiv:2110.04374 , year=
A few more examples may be worth billions of parameters , author=. arXiv preprint arXiv:2110.04374 , year=
-
[21]
arXiv preprint arXiv:2305.12827 , year=
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models , author=. arXiv preprint arXiv:2305.12827 , year=
-
[22]
Toward a universal law of generalization for psychological science , author=. Science , volume=. 1987 , publisher=
work page 1987
-
[23]
A formal theory of inductive inference. Part I , author=. Information and control , volume=. 1964 , publisher=
work page 1964
-
[24]
Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
Climbing towards NLU: On meaning, form, and understanding in the age of data , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
-
[26]
Quantifying Representation Reliability in Self-Supervised Learning Models , author=. 2024 , booktitle=
work page 2024
-
[27]
Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae , title =. NeurIPS , year =
-
[28]
International Conference on Machine Learning , pages=
Contrastive learning inverts the data generating process , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[29]
Journal of biomedical informatics , volume=
Language models are an effective representation learning technique for electronic health record data , author=. Journal of biomedical informatics , volume=. 2021 , publisher=
work page 2021
-
[30]
Transactions of the American mathematical society , volume=
Theory of reproducing kernels , author=. Transactions of the American mathematical society , volume=
- [31]
-
[34]
Advances in neural information processing systems , volume=
Learning curves: Asymptotic values and rate of convergence , author=. Advances in neural information processing systems , volume=
-
[35]
Advances in Neural Information Processing Systems , volume=
Procedural image programs for representation learning , author=. Advances in Neural Information Processing Systems , volume=
-
[36]
Advances in Neural Information Processing Systems , year=
Learning to See by Looking at Noise , author=. Advances in Neural Information Processing Systems , year=
-
[37]
The visual task adaptation benchmark , author=
-
[39]
Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal=
-
[40]
arXiv preprint arXiv:2102.06701 , year=
Explaining neural scaling laws , author=. arXiv preprint arXiv:2102.06701 , year=
-
[43]
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open models based on gemini research and technology , author=. arXiv preprint arXiv:2403.08295 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
International conference on machine learning , pages=
Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=
work page 2021
-
[46]
Geng, Xinyang and Liu, Hao , title =
-
[47]
Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
-
[48]
The Bell system technical journal , volume=
A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=
work page 1948
-
[49]
International conference on machine learning , pages=
Essentially no barriers in neural network energy landscape , author=. International conference on machine learning , pages=. 2018 , organization=
work page 2018
-
[50]
Advances in Neural Information Processing Systems , volume=
Uniform convergence may be unable to explain generalization in deep learning , author=. Advances in Neural Information Processing Systems , volume=
-
[51]
Advances in neural information processing systems , volume=
Loss surfaces, mode connectivity, and fast ensembling of dnns , author=. Advances in neural information processing systems , volume=
-
[52]
Topology and Geometry of Half-Rectified Network Optimization
Topology and geometry of half-rectified network optimization , author=. arXiv preprint arXiv:1611.01540 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[53]
International Conference on Machine Learning , pages=
Linear mode connectivity and the lottery ticket hypothesis , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[54]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Rosetta neurons: Mining the common units in a model zoo , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[55]
Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY , year=
Learning how the world works: Specifications for predictive networks in robots and brains , author=. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY , year=
-
[58]
Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor , journal=
-
[59]
arXiv preprint arXiv:2311.17076 , year=
Compositional chain-of-thought prompting for large multimodal models , author=. arXiv preprint arXiv:2311.17076 , year=
-
[61]
The Quark and the Jaguar: Adventures in the Simple and the Complex , author=. 1995 , publisher=
work page 1995
-
[62]
Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , author=. Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=. 2010 , organization=
work page 2010
-
[63]
Proceedings of the National Academy of Sciences , volume=
Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=
work page 2020
-
[64]
IEEE transactions on pattern analysis and machine intelligence , volume=
Best-buddies similarity—Robust template matching using mutual nearest neighbors , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2017 , publisher=
work page 2017
- [65]
-
[66]
Journal of Machine Learning Research , volume=
Optimal convergence rates for convex distributed optimization in networks , author=. Journal of Machine Learning Research , volume=
-
[67]
The Philosophical Quarterly (1950-) , volume=
Three kinds of scientific realism , author=. The Philosophical Quarterly (1950-) , volume=. 1982 , publisher=
work page 1950
-
[68]
Philosophy of Science , volume=
Reconstructing scientific realism to rebut the pessimistic meta-induction , author=. Philosophy of Science , volume=. 2007 , publisher=
work page 2007
-
[69]
Philosophy of Science , volume=
In defense of convergent realism , author=. Philosophy of Science , volume=. 1982 , publisher=
work page 1982
- [70]
-
[71]
International Conference on Machine Learning , pages=
Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[72]
Local minima and plateaus in hierarchical structures of multilayer perceptrons , author=. Neural networks , volume=. 2000 , publisher=
work page 2000
-
[73]
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape , author=. arXiv preprint arXiv:1907.02911 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[74]
Advances in Neural Information Processing Systems , volume=
Optimizing mode connectivity via neuron alignment , author=. Advances in Neural Information Processing Systems , volume=
-
[77]
Nature Communications , volume=
Segment anything in medical images , author=. Nature Communications , volume=. 2024 , publisher=
work page 2024
-
[78]
arXiv preprint arXiv:2110.06296 , year=
The role of permutation invariance in linear mode connectivity of neural networks , author=. arXiv preprint arXiv:2110.06296 , year=
-
[79]
Advances in Neural Information Processing Systems , volume=
Large scale structure of neural network loss landscapes , author=. Advances in Neural Information Processing Systems , volume=
-
[80]
Learning internal representations by error propagation , author=. 1985 , institution=
work page 1985
-
[81]
Advances in Neural Information Processing Systems , volume=
Bad global minima exist and sgd can reach them , author=. Advances in Neural Information Processing Systems , volume=
-
[82]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Understanding image representations by measuring their equivariance and equivalence , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[84]
Advances in neural information processing systems , volume=
Revisiting model stitching to compare neural representations , author=. Advances in neural information processing systems , volume=
-
[85]
arXiv preprint arXiv:2305.11206 , year=
LIMA: Less Is More for Alignment , author=. arXiv preprint arXiv:2305.11206 , year=
-
[86]
International Conference on Learning Representations , year=
Effect of scale on catastrophic forgetting in neural networks , author=. International Conference on Learning Representations , year=
-
[87]
arXiv preprint arXiv:2112.09153 , year=
An empirical investigation of the role of pre-training in lifelong learning , author=. arXiv preprint arXiv:2112.09153 , year=
-
[88]
Improving image generation with better captions , author=. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf , volume=
-
[89]
Wu, Tsung-Han and Lian, Long and Gonzalez, Joseph E and Li, Boyi and Darrell, Trevor , journal=. Self-correcting
-
[90]
Lian, Long and Shi, Baifeng and Yala, Adam and Darrell, Trevor and Li, Boyi , journal=
-
[91]
International Conference on Machine Learning , pages=
Wide neural networks forget less catastrophically , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[92]
LoRA: Low-Rank Adaptation of Large Language Models
Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[93]
International Conference on Machine Learning , pages=
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[94]
arXiv preprint arXiv:2205.12411 , year=
Linear connectivity reveals generalization strategies , author=. arXiv preprint arXiv:2205.12411 , year=
-
[95]
Machine super intelligence , author=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.