pith. machine review for the scientific record. sign in

arxiv: 2405.07987 · v5 · submitted 2024-05-13 · 💻 cs.LG · cs.AI· cs.CV· cs.NE

Recognition: 2 theorem links

· Lean Theorem

The Platonic Representation Hypothesis

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.NE
keywords representation convergenceplatonic representationneural network representationsmultimodal alignmentvision language modelsstatistical model of realityscaling lawsrepresentation similarity
0
0 comments X

The pith

Representations learned by different neural networks are converging toward a shared statistical model of reality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper collects examples showing that neural networks trained on different tasks, architectures, and data are producing more aligned internal representations as time passes and models grow larger. In particular, vision models and language models are shown to measure distances between data points in increasingly similar ways once they reach sufficient scale. The authors hypothesize that this trend reflects movement toward a single underlying statistical description of the world, which they name the platonic representation. If the hypothesis holds, many observed compatibilities between seemingly unrelated models would be explained by their shared approach to this common structure rather than by coincidence or identical training conditions.

Core claim

We argue that representations in AI models are converging. Evidence includes growing alignment across time, domains, and modalities, with vision and language networks measuring distances between points more alike as they scale. We hypothesize that this convergence is driven toward a shared statistical model of reality, which we term the platonic representation, and we outline possible selective pressures favoring it along with implications and counterexamples.

What carries the argument

The platonic representation: the hypothesized common statistical model of reality that separate neural networks approach as they scale.

If this is right

  • Models trained independently will exhibit greater interoperability and zero-shot transfer as they grow larger.
  • Distance-based tasks such as retrieval or clustering will produce more consistent results across modalities.
  • The space of possible learned representations narrows with scale, limiting the diversity of internal world models.
  • Counterexamples will become rarer but will still exist for models trained on narrow or non-natural data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the trend continues, new modalities such as audio or robotics may show the same cross-modal alignment once models reach comparable scale.
  • The hypothesis suggests that apparent differences between model families are transient and will diminish with additional compute and data.
  • A practical test would be to measure whether the top principal components of scaled vision and language embeddings become progressively more linearly mappable to each other.

Load-bearing premise

That increasing similarity between representation spaces indicates convergence to an objective underlying model rather than shared inductive biases, overlapping training data, or architectural commonalities.

What would settle it

A demonstration that further scaling of vision and language models causes their pairwise distance measurements to diverge or stabilize at different alignments instead of continuing to match more closely.

read the original abstract

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper surveys examples of converging representations in neural networks across time, domains, and modalities; presents new measurements showing that larger vision and language models increasingly align in how they measure distances between datapoints; hypothesizes that this reflects convergence toward a shared statistical model of reality termed the 'platonic representation'; discusses possible selective pressures; and outlines implications, limitations, and counterexamples.

Significance. If supported by stronger controls and falsifiable tests, the hypothesis could provide a unifying lens for representation learning, with implications for scaling laws, cross-modal transfer, and the emergence of shared world models in AI. The work compiles existing literature effectively but adds only limited new measurements without distinguishing the claimed mechanism from alternatives.

major comments (3)
  1. [modality-convergence demonstration] The cross-modal distance alignment demonstration lacks ablations that hold training data, objectives, or architectures fixed while varying only the underlying 'reality model'; without these, the measurements cannot rule out shared inductive biases or corpus overlap as the primary driver (see the modality-convergence section and associated figures).
  2. [selective pressures section] The selective-pressures discussion invokes the platonic-representation hypothesis to explain the same alignment observations used to motivate it, creating a circularity that weakens the causal claim; additional independent evidence or a formal model of the pressures is needed (see the selective pressures section).
  3. [implications and limitations sections] No quantitative bounds, falsifiable predictions, or controlled experiments are provided that would distinguish convergence to an objective statistical model from simpler alternatives such as scaling-induced similarity; this leaves the central hypothesis underdetermined (see the implications and limitations sections).
minor comments (2)
  1. [Abstract] The abstract introduces the 'platonic representation' without a concise operational definition or mathematical characterization that could be referenced later.
  2. [figures] Figure captions and axis labels in the distance-alignment plots could more explicitly state the distance metric and normalization used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights opportunities to strengthen the empirical claims and clarify the logical structure of the hypothesis. We address each major comment below, proposing targeted revisions to the manuscript where appropriate while defending the core contributions on substantive grounds.

read point-by-point responses
  1. Referee: The cross-modal distance alignment demonstration lacks ablations that hold training data, objectives, or architectures fixed while varying only the underlying 'reality model'; without these, the measurements cannot rule out shared inductive biases or corpus overlap as the primary driver (see the modality-convergence section and associated figures).

    Authors: We agree that controlled ablations isolating the 'reality model' from inductive biases and data overlap would provide stronger causal evidence. Such experiments are computationally intensive for frontier-scale models and were not feasible within the scope of this work. However, the observed alignment trend holds across independently developed models (e.g., different vision transformers and language models trained by separate groups on partially overlapping but non-identical corpora), and we cite supporting literature on convergence under varied objectives. In revision, we will expand the modality-convergence section to explicitly discuss these potential confounds, include additional controls where smaller-scale proxies are available, and qualify the interpretation accordingly. revision: partial

  2. Referee: The selective-pressures discussion invokes the platonic-representation hypothesis to explain the same alignment observations used to motivate it, creating a circularity that weakens the causal claim; additional independent evidence or a formal model of the pressures is needed (see the selective pressures section).

    Authors: The selective pressures section is framed as a hypothesis-generating discussion rather than a causal proof. The alignment observations motivate the platonic representation as a unifying description; the pressures (e.g., optimization for generalization, data efficiency, and cross-task transfer) are proposed mechanisms drawn from independent scaling-law results and representation-learning theory. To reduce any perceived circularity, we will revise the section to separate the descriptive hypothesis from the mechanistic discussion, add citations to prior work on inductive biases that predate our measurements, and clarify that the pressures are testable via future interventions such as controlled training regimes. revision: partial

  3. Referee: No quantitative bounds, falsifiable predictions, or controlled experiments are provided that would distinguish convergence to an objective statistical model from simpler alternatives such as scaling-induced similarity; this leaves the central hypothesis underdetermined (see the implications and limitations sections).

    Authors: We acknowledge that the current version presents the hypothesis primarily through qualitative trends and literature synthesis rather than quantitative bounds or explicit falsification tests. In the revised manuscript, we will add a dedicated subsection in Implications that articulates concrete, measurable predictions (e.g., expected growth in cross-modal distance correlation as a function of parameter count, and thresholds beyond which scaling alone cannot explain residual alignment). We will also expand Limitations to contrast the platonic account against pure scaling-induced similarity and outline feasible controlled experiments using matched smaller models. revision: yes

Circularity Check

0 steps flagged

No significant circularity; hypothesis framed around external observations

full rationale

The paper surveys existing literature on representation convergence across models and domains, then reports new cross-modal measurements showing increasing alignment in distance structures between vision and language models as scale increases. The platonic representation is introduced explicitly as a hypothesis to interpret these trends, followed by discussion of possible selective pressures and implications. No derivation step reduces by construction to its own inputs: there are no fitted parameters renamed as predictions, no self-definitional loops where the target quantity is presupposed in the premise, and no load-bearing uniqueness theorems imported solely via self-citation. The central claim remains an interpretive hypothesis whose support is drawn from cited external results and the reported measurements rather than tautological re-expression of the same data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that representation similarity metrics capture meaningful convergence to an objective reality model. No free parameters are fitted. The platonic representation is introduced as a new conceptual entity without independent falsifiable evidence.

axioms (1)
  • domain assumption Representations learned by neural networks can be compared across models and modalities using distance metrics that reflect semantic similarity.
    Invoked when claiming convergence from alignment of distance measures.
invented entities (1)
  • platonic representation no independent evidence
    purpose: A shared statistical model of reality that different neural networks converge toward.
    Postulated to explain observed alignment; no independent evidence or falsifiable prediction is provided beyond the alignment itself.

pith-pipeline@v0.9.0 · 5419 in / 1145 out tokens · 41970 ms · 2026-05-15T05:58:39.603476+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    The Multitask Scaling Hypothesis: There are fewer representations that are competent for N tasks than there are for M < N tasks. As we train more general models that solve more tasks at once, we should expect fewer possible solutions.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

    cs.LG 2026-05 unverdicted novelty 8.0

    Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.

  2. CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

    cs.CR 2026-05 unverdicted novelty 7.0

    LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.

  3. Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models

    cs.CV 2026-05 unverdicted novelty 7.0

    DCA measures intra-sample representational consistency in frozen vision models by checking per-dimension coactivation across regions, achieving 0.91-0.93 AUC in deepfake detection with DINOv3 features.

  4. Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance

    cs.CV 2026-05 unverdicted novelty 7.0

    Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and veri...

  5. Geometry-Aware CLIP Retrieval via Local Cross-Modal Alignment and Steering

    cs.CV 2026-04 unverdicted novelty 7.0

    Neighborhood re-ranking via Hungarian matching and query-conditioned local steering improve CLIP retrieval on attribute-binding and compositional tasks by addressing local geometric inconsistencies.

  6. Positive Alignment: Artificial Intelligence for Human Flourishing

    cs.AI 2026-05 unverdicted novelty 6.0

    Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.

  7. What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

    cs.CV 2026-05 unverdicted novelty 6.0

    Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.

  8. Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

    cs.LG 2026-05 unverdicted novelty 6.0

    Health foundation model embeddings contain an interpretable symbolic organization shared across modalities that supports cross-domain transfer without joint training.

  9. Compared to What? Baselines and Metrics for Counterfactual Prompting

    cs.CL 2026-05 conditional novelty 6.0

    Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistica...

  10. Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities

    cs.LG 2026-05 unverdicted novelty 6.0

    Frozen text-pretrained transformer weights transfer across modalities through a thin interface, achieving SOTA on a robotic task and parity on decision-making with far fewer trainable parameters.

  11. Human Cognition in Machines: A Unified Perspective of World Models

    cs.RO 2026-04 unverdicted novelty 6.0

    The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...

  12. The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    Centroid erasure shows language representations overshadow vision in multimodal models, and text-centroid contrastive decoding recovers substantial accuracy on visual reasoning tasks.

  13. Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models

    cs.LG 2026-04 unverdicted novelty 6.0

    A feedforward graph of heterogeneous frozen LLMs linked by linear projections in a shared latent space outperforms single models on ARC-Challenge, OpenBookQA, and MMLU using just 17.6M trainable parameters.

  14. The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

    cs.LG 2026-04 unverdicted novelty 6.0

    The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...

  15. Rethinking the Good Enough Embedding for Easy Few-Shot Learning

    cs.CV 2026-05 conditional novelty 5.0

    Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.

  16. Control Charts for Multi-agent Systems

    cs.MA 2026-05 unverdicted novelty 5.0

    Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against ...

  17. ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across Language Models and Neural Perturbation Data

    cs.LG 2026-04 unverdicted novelty 5.0

    ATLAS shows constitutions induce recoverable latent geometry in LLMs that redistributes but remains detectable across models and neural perturbation data via source-defined families and AUC separations.

  18. Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

    cs.CV 2026-04 unverdicted novelty 5.0

    Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.

  19. Positive Alignment: Artificial Intelligence for Human Flourishing

    cs.AI 2026-05 unverdicted novelty 4.0

    Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.

  20. Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers

    cs.CV 2026-05 unverdicted novelty 4.0

    Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.

  21. Measuring AI Reasoning: A Guide for Researchers

    cs.AI 2026-05 unverdicted novelty 4.0

    Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

  22. Toward Aristotelian Medical Representations: Backpropagation-Free Layer-wise Analysis for Interpretable Generalized Metric Learning on MedMNIST

    cs.CV 2026-04 unverdicted novelty 4.0

    A-ROM delivers competitive MedMNIST performance via pretrained ViT metric spaces, a concept dictionary, and kNN without backpropagation or fine-tuning, framed as interpretable few-shot learning under the Platonic Repr...

Reference graph

Works this paper leans on

272 extracted references · 272 canonical work pages · cited by 21 Pith papers · 35 internal anchors

  1. [1]

    Cognitive Systems Research , volume =

    Explanatory models in neuroscience: Part 2--constraint-based intelligibility , author=. Cognitive Systems Research , volume =

  2. [2]

    Communications of the ACM , volume=

    Imagenet classification with deep convolutional neural networks , author=. Communications of the ACM , volume=. 2017 , publisher=

  3. [3]

    Nature communications , volume=

    Input--output maps are strongly biased towards simple outputs , author=. Nature communications , volume=. 2018 , publisher=

  4. [4]

    arXiv preprint arXiv:2201.10005 , year=

    Text and code embeddings by contrastive pre-training , author=. arXiv preprint arXiv:2201.10005 , year=

  5. [5]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

    Efficient indexing of billion-scale datasets of deep descriptors , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

  6. [6]

    Efficient Estimation of Word Representations in Vector Space

    Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , year=

  7. [7]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Lit: Zero-shot transfer with locked-image text tuning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  8. [10]

    International conference on machine learning , pages=

    Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=

  9. [11]

    Advances in neural information processing systems , volume=

    Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability , author=. Advances in neural information processing systems , volume=

  10. [12]

    steerability

    On the" steerability" of generative adversarial networks , author=. arXiv preprint arXiv:1907.07171 , year=

  11. [13]

    arXiv preprint arXiv:2211.01201 , year=

    Human alignment of neural network representations , author=. arXiv preprint arXiv:2211.01201 , year=

  12. [14]

    International journal of computer vision , volume=

    Imagenet large scale visual recognition challenge , author=. International journal of computer vision , volume=. 2015 , publisher=

  13. [15]

    Dialectica , volume=

    The truth in realism , author=. Dialectica , volume=. 1989 , publisher=

  14. [16]

    International conference on algorithmic learning theory , pages=

    Measuring statistical dependence with Hilbert-Schmidt norms , author=. International conference on algorithmic learning theory , pages=. 2005 , organization=

  15. [17]

    Proceedings of the National Academy of Sciences , volume=

    The neural architecture of language: Integrative modeling converges on predictive processing , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=

  16. [18]

    Proceedings of the national academy of sciences , volume=

    Performance-optimized hierarchical models predict neural responses in higher visual cortex , author=. Proceedings of the national academy of sciences , volume=. 2014 , publisher=

  17. [19]

    BERTScore: Evaluating Text Generation with BERT

    Bertscore: Evaluating text generation with bert , author=. arXiv preprint arXiv:1904.09675 , year=

  18. [20]

    arXiv preprint arXiv:2110.04374 , year=

    A few more examples may be worth billions of parameters , author=. arXiv preprint arXiv:2110.04374 , year=

  19. [21]

    arXiv preprint arXiv:2305.12827 , year=

    Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models , author=. arXiv preprint arXiv:2305.12827 , year=

  20. [22]

    Science , volume=

    Toward a universal law of generalization for psychological science , author=. Science , volume=. 1987 , publisher=

  21. [23]

    Part I , author=

    A formal theory of inductive inference. Part I , author=. Information and control , volume=. 1964 , publisher=

  22. [24]

    Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

    Climbing towards NLU: On meaning, form, and understanding in the age of data , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

  23. [26]

    2024 , booktitle=

    Quantifying Representation Reliability in Self-Supervised Learning Models , author=. 2024 , booktitle=

  24. [27]

    NeurIPS , year =

    Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae , title =. NeurIPS , year =

  25. [28]

    International Conference on Machine Learning , pages=

    Contrastive learning inverts the data generating process , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  26. [29]

    Journal of biomedical informatics , volume=

    Language models are an effective representation learning technique for electronic health record data , author=. Journal of biomedical informatics , volume=. 2021 , publisher=

  27. [30]

    Transactions of the American mathematical society , volume=

    Theory of reproducing kernels , author=. Transactions of the American mathematical society , volume=

  28. [31]

    1998 , publisher=

    Learning with kernels , author=. 1998 , publisher=

  29. [34]

    Advances in neural information processing systems , volume=

    Learning curves: Asymptotic values and rate of convergence , author=. Advances in neural information processing systems , volume=

  30. [35]

    Advances in Neural Information Processing Systems , volume=

    Procedural image programs for representation learning , author=. Advances in Neural Information Processing Systems , volume=

  31. [36]

    Advances in Neural Information Processing Systems , year=

    Learning to See by Looking at Noise , author=. Advances in Neural Information Processing Systems , year=

  32. [37]

    The visual task adaptation benchmark , author=

  33. [39]

    Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , journal=

  34. [40]

    arXiv preprint arXiv:2102.06701 , year=

    Explaining neural scaling laws , author=. arXiv preprint arXiv:2102.06701 , year=

  35. [43]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma: Open models based on gemini research and technology , author=. arXiv preprint arXiv:2403.08295 , year=

  36. [45]

    International conference on machine learning , pages=

    Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

  37. [46]

    Geng, Xinyang and Liu, Hao , title =

  38. [47]

    OpenAI blog , volume=

    Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

  39. [48]

    The Bell system technical journal , volume=

    A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

  40. [49]

    International conference on machine learning , pages=

    Essentially no barriers in neural network energy landscape , author=. International conference on machine learning , pages=. 2018 , organization=

  41. [50]

    Advances in Neural Information Processing Systems , volume=

    Uniform convergence may be unable to explain generalization in deep learning , author=. Advances in Neural Information Processing Systems , volume=

  42. [51]

    Advances in neural information processing systems , volume=

    Loss surfaces, mode connectivity, and fast ensembling of dnns , author=. Advances in neural information processing systems , volume=

  43. [52]

    Topology and Geometry of Half-Rectified Network Optimization

    Topology and geometry of half-rectified network optimization , author=. arXiv preprint arXiv:1611.01540 , year=

  44. [53]

    International Conference on Machine Learning , pages=

    Linear mode connectivity and the lottery ticket hypothesis , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  45. [54]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Rosetta neurons: Mining the common units in a model zoo , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  46. [55]

    Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY , year=

    Learning how the world works: Specifications for predictive networks in robots and brains , author=. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, NY , year=

  47. [58]

    Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor , journal=

  48. [59]

    arXiv preprint arXiv:2311.17076 , year=

    Compositional chain-of-thought prompting for large multimodal models , author=. arXiv preprint arXiv:2311.17076 , year=

  49. [61]

    1995 , publisher=

    The Quark and the Jaguar: Adventures in the Simple and the Complex , author=. 1995 , publisher=

  50. [62]

    Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=

    Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , author=. Proceedings of the thirteenth international conference on artificial intelligence and statistics , pages=. 2010 , organization=

  51. [63]

    Proceedings of the National Academy of Sciences , volume=

    Prevalence of neural collapse during the terminal phase of deep learning training , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

  52. [64]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Best-buddies similarity—Robust template matching using mutual nearest neighbors , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2017 , publisher=

  53. [65]

    1897 , publisher=

    Lectures on the `Republic' of Plato , author=. 1897 , publisher=

  54. [66]

    Journal of Machine Learning Research , volume=

    Optimal convergence rates for convex distributed optimization in networks , author=. Journal of Machine Learning Research , volume=

  55. [67]

    The Philosophical Quarterly (1950-) , volume=

    Three kinds of scientific realism , author=. The Philosophical Quarterly (1950-) , volume=. 1982 , publisher=

  56. [68]

    Philosophy of Science , volume=

    Reconstructing scientific realism to rebut the pessimistic meta-induction , author=. Philosophy of Science , volume=. 2007 , publisher=

  57. [69]

    Philosophy of Science , volume=

    In defense of convergent realism , author=. Philosophy of Science , volume=. 1982 , publisher=

  58. [70]

    1981 , publisher=

    The Rationality of Science , author=. 1981 , publisher=

  59. [71]

    International Conference on Machine Learning , pages=

    Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  60. [72]

    Neural networks , volume=

    Local minima and plateaus in hierarchical structures of multilayer perceptrons , author=. Neural networks , volume=. 2000 , publisher=

  61. [73]

    Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

    Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape , author=. arXiv preprint arXiv:1907.02911 , year=

  62. [74]

    Advances in Neural Information Processing Systems , volume=

    Optimizing mode connectivity via neuron alignment , author=. Advances in Neural Information Processing Systems , volume=

  63. [77]

    Nature Communications , volume=

    Segment anything in medical images , author=. Nature Communications , volume=. 2024 , publisher=

  64. [78]

    arXiv preprint arXiv:2110.06296 , year=

    The role of permutation invariance in linear mode connectivity of neural networks , author=. arXiv preprint arXiv:2110.06296 , year=

  65. [79]

    Advances in Neural Information Processing Systems , volume=

    Large scale structure of neural network loss landscapes , author=. Advances in Neural Information Processing Systems , volume=

  66. [80]

    1985 , institution=

    Learning internal representations by error propagation , author=. 1985 , institution=

  67. [81]

    Advances in Neural Information Processing Systems , volume=

    Bad global minima exist and sgd can reach them , author=. Advances in Neural Information Processing Systems , volume=

  68. [82]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Understanding image representations by measuring their equivariance and equivalence , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  69. [84]

    Advances in neural information processing systems , volume=

    Revisiting model stitching to compare neural representations , author=. Advances in neural information processing systems , volume=

  70. [85]

    arXiv preprint arXiv:2305.11206 , year=

    LIMA: Less Is More for Alignment , author=. arXiv preprint arXiv:2305.11206 , year=

  71. [86]

    International Conference on Learning Representations , year=

    Effect of scale on catastrophic forgetting in neural networks , author=. International Conference on Learning Representations , year=

  72. [87]

    arXiv preprint arXiv:2112.09153 , year=

    An empirical investigation of the role of pre-training in lifelong learning , author=. arXiv preprint arXiv:2112.09153 , year=

  73. [88]

    Computer Science

    Improving image generation with better captions , author=. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf , volume=

  74. [89]

    Self-correcting

    Wu, Tsung-Han and Lian, Long and Gonzalez, Joseph E and Li, Boyi and Darrell, Trevor , journal=. Self-correcting

  75. [90]

    Lian, Long and Shi, Baifeng and Yala, Adam and Darrell, Trevor and Li, Boyi , journal=

  76. [91]

    International Conference on Machine Learning , pages=

    Wide neural networks forget less catastrophically , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  77. [92]

    LoRA: Low-Rank Adaptation of Large Language Models

    Lora: Low-rank adaptation of large language models , author=. arXiv preprint arXiv:2106.09685 , year=

  78. [93]

    International Conference on Machine Learning , pages=

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  79. [94]

    arXiv preprint arXiv:2205.12411 , year=

    Linear connectivity reveals generalization strategies , author=. arXiv preprint arXiv:2205.12411 , year=

  80. [95]

    Machine super intelligence , author=

Showing first 80 references.