A mathematical theory of balancing relational generalization and memorization
Pith reviewed 2026-05-25 05:53 UTC · model grok-4.3
The pith
Kernel ridge regression balances transitive inference and exception memorization only under specific representational geometries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Kernel ridge regression models can solve transitive inference while correctly handling one exception provided the representational geometry separates the general rule from the exception in kernel space; the same models fail for other geometries even when the task parameters are held fixed.
What carries the argument
Analytical solution of kernel ridge regression on embeddings of ordered tuples that include one explicit exception to the transitive rule.
If this is right
- Generalization on the task requires geometries in which the exception does not produce destructive interference with the transitive kernel structure.
- Pretrained language models finetuned on the task will exhibit both rule-consistent generalization and the systematic mistakes that follow from the geometry analysis.
- The presence of even one exception makes the problem mechanically stricter than standard transitive inference because geometry must now be tuned to protect both the rule and the exception.
Where Pith is reading between the lines
- The same geometry sensitivity is likely to appear in other relational tasks that combine a dominant rule with isolated counterexamples.
- Controlling embedding geometry during pretraining or fine-tuning could be used to improve exception handling without sacrificing rule generalization.
- Direct tests on transformer architectures rather than kernel proxies would clarify whether the predicted geometry dependence survives in modern networks.
Load-bearing premise
Kernel ridge regression behavior is representative of how neural networks learn on this relational task.
What would settle it
If language models finetuned on ordered relations with one exception neither generalize according to the transitive rule nor produce the specific error pattern predicted by the kernel analysis, the claimed link between geometry and performance would not hold.
Figures
read the original abstract
Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors to unseen situations. This ability requires us to learn rules and regularities that allow for such generalizations. At the same time, in most complex environments, any rule will have its exceptions. How do learning systems balance between learning general regularities and memorizing exceptions? We argue that a lack of task paradigms has hindered the study of this essential ability. To address this gap, we introduce a novel task, transitive inference with exceptions, that tests for relational generalization and memorization of an exception to the relational rule. We then analytically characterize the behavior of a simple, theoretically tractable model of neural network learning (kernel ridge regression) across a broad family of representations and task parameters. We find that these models can balance between relational generalization and memorization, but unlike for transitive inference without an exception, successful generalization is sensitive to the specific representational geometry. We explain why this task is more challenging mechanistically by drawing on our analytical theory. Finally, we validate our theoretical insights in pretrained language models that are finetuned on ordered relations, finding that these models successfully generalize according to the transitive rule, but also make the kinds of systematic mistakes predicted by our theory. Overall, our theory shows how learning systems can balance between relational generalization and memorization, explains how this can go wrong, and emphasizes the need for new task paradigms designed to probe this ability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the transitive inference with exceptions task to probe how learning systems balance relational generalization against memorization of exceptions. It provides an analytical characterization of kernel ridge regression (KRR) across a family of representations and task parameters, concluding that these models can achieve the balance but that successful generalization becomes sensitive to representational geometry precisely when an exception is present (unlike the exception-free case). The authors explain the mechanistic source of this sensitivity, then show that the predicted error patterns appear when pretrained language models are finetuned on ordered relations.
Significance. If the KRR analysis is internally sound and the geometry-sensitivity predictions are borne out beyond the convex fixed-feature setting, the work supplies a concrete mathematical account of the generalization-memorization trade-off on relational tasks and a falsifiable link to observed LM behavior. The explicit analytical treatment of KRR and the direct comparison to LM error patterns are strengths that would be valuable to the community.
major comments (2)
- [Abstract / analytical characterization] Abstract (paragraph on analytical characterization) and the modeling section: the central claim that 'successful generalization is sensitive to the specific representational geometry' is derived under kernel ridge regression with fixed features. Because KRR solves a convex problem in a predetermined feature space, it cannot capture representation learning under gradient descent; the manuscript does not demonstrate that the same geometry dependence survives once features are allowed to adapt, which is required to underwrite the mechanistic explanation offered for language-model behavior.
- [LM validation experiments] Validation section (LM experiments): the reported systematic mistakes in finetuned LMs are said to match the theory's predictions, yet the manuscript does not report controls that isolate representational geometry (e.g., by varying embedding dimensionality or kernel bandwidth while holding other factors fixed). Without such controls it remains unclear whether the observed errors are produced by the same mechanism identified in the KRR analysis.
minor comments (1)
- [Task definition] Notation for the exception parameter and the geometry family should be introduced with a single consolidated table or figure early in the task-definition section to reduce cross-referencing.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract / analytical characterization] Abstract (paragraph on analytical characterization) and the modeling section: the central claim that 'successful generalization is sensitive to the specific representational geometry' is derived under kernel ridge regression with fixed features. Because KRR solves a convex problem in a predetermined feature space, it cannot capture representation learning under gradient descent; the manuscript does not demonstrate that the same geometry dependence survives once features are allowed to adapt, which is required to underwrite the mechanistic explanation offered for language-model behavior.
Authors: We agree that the analytical results are derived exclusively for kernel ridge regression with fixed features, which permits the closed-form characterization across representations and task parameters. The manuscript presents KRR as a tractable model of neural network learning but does not provide analysis or experiments showing that the reported geometry sensitivity persists when features adapt under gradient descent. We will revise the abstract and modeling section to explicitly state this scope limitation and add discussion of the implications for the mechanistic account of language-model behavior. revision: partial
-
Referee: [LM validation experiments] Validation section (LM experiments): the reported systematic mistakes in finetuned LMs are said to match the theory's predictions, yet the manuscript does not report controls that isolate representational geometry (e.g., by varying embedding dimensionality or kernel bandwidth while holding other factors fixed). Without such controls it remains unclear whether the observed errors are produced by the same mechanism identified in the KRR analysis.
Authors: We acknowledge that the LM experiments do not include explicit controls that isolate representational geometry while holding other factors fixed. The reported error patterns are consistent with the KRR predictions, but alternative mechanisms cannot be ruled out without additional controls. We will revise the validation section to discuss this limitation and, where feasible, incorporate supplementary analyses (e.g., varying model embedding dimensions or using controlled synthetic representations) to better link the observations to the KRR mechanism. revision: partial
- Demonstrating that the geometry dependence survives under adaptive feature learning via gradient descent
Circularity Check
No circularity: analytical KRR characterization is self-contained
full rationale
The paper performs an analytical characterization of kernel ridge regression behavior on the transitive inference with exceptions task across representational geometries. This is a direct mathematical derivation from the KRR objective and kernel definitions rather than any fitting of parameters to target outputs followed by relabeling as prediction. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are referenced as load-bearing. The subsequent empirical validation on language models is presented as an independent check, not part of the core derivation chain. Absent any quoted equations that reduce the claimed results to their inputs by construction, the derivation chain does not exhibit circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The discovery of structural form
Charles Kemp and Joshua B Tenenbaum. “The discovery of structural form”. In:Proceedings of the National Academy of Sciences105.31 (2008), pp. 10687–10692
work page 2008
-
[2]
Building machines that learn and think like people
Brenden M Lake et al. “Building machines that learn and think like people”. In:Behavioral and brain sciences40 (2017), e253
work page 2017
-
[3]
Connectionism and cognitive architecture: A critical analysis
Jerry A Fodor and Zenon W Pylyshyn. “Connectionism and cognitive architecture: A critical analysis”. In:Cognition28.1-2 (1988), pp. 3–71
work page 1988
-
[4]
Compositionality decomposed: How do neural networks generalise?
Dieuwke Hupkes et al. “Compositionality decomposed: How do neural networks generalise?” In:Journal of Artificial Intelligence Research67 (2020), pp. 757–795
work page 2020
-
[5]
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
Daniel Keysers et al. “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”. In:International Conference on Learning Representations. 2020.URL: https://openreview.net/forum?id=SygcCnNKwr
work page 2020
-
[6]
Representation of real- world event schemas during narrative perception
Christopher Baldassano, Uri Hasson, and Kenneth A Norman. “Representation of real- world event schemas during narrative perception”. In:Journal of Neuroscience38.45 (2018), pp. 9689–9699
work page 2018
-
[7]
David E Rumelhart and James L Mcclelland.On Learning the Past Tenses of English Verbs. Tech. rep. 1985
work page 1985
-
[8]
Relational knowledge: The foundation of higher cognition
Graeme S Halford, William H Wilson, and Steven Phillips. “Relational knowledge: The foundation of higher cognition”. In:Trends in cognitive sciences14.11 (2010), pp. 497–505
work page 2010
-
[9]
Relational inductive biases, deep learning, and graph networks
Peter W Battaglia et al. “Relational inductive biases, deep learning, and graph networks”. In: arXiv preprint arXiv:1806.01261(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Judgment and reasoning in the child
J Piaget. “Judgment and reasoning in the child.” In: (1928)
work page 1928
-
[11]
Transitive inferences and memory in young children
Peter E Bryant and Thomas Trabasso. “Transitive inferences and memory in young children.” In:Nature(1971)
work page 1971
-
[12]
Asymmetric reinforcement learning facilitates human inference of transitive relations
Simon Ciranka et al. “Asymmetric reinforcement learning facilitates human inference of transitive relations”. In:Nature Human Behaviour6.4 (2022), pp. 555–564. 10
work page 2022
-
[13]
Neural knowledge assembly in humans and neural networks
Stephanie Nelli et al. “Neural knowledge assembly in humans and neural networks”. In: Neuron111.9 (2023), pp. 1504–1516
work page 2023
-
[14]
Brendan O McGonigle and Margaret Chalmers. “Are monkeys logical?” In:Nature267.5613 (1977), pp. 694–696
work page 1977
-
[15]
Transitive inference in rats (Rattus norvegicus)
Hank Davis. “Transitive inference in rats (Rattus norvegicus).” In:Journal of Comparative Psychology106.4 (1992), p. 342
work page 1992
-
[16]
Fish can infer social rank by observation alone
Logan Grosenick, Tricia S Clement, and Russell D Fernald. “Fish can infer social rank by observation alone”. In:Nature445.7126 (2007), pp. 429–432
work page 2007
-
[17]
Transitive inference in Polistes paper wasps
Elizabeth A Tibbetts et al. “Transitive inference in Polistes paper wasps”. In:Biology letters 15.5 (2019)
work page 2019
-
[18]
Carlo De Lillo, D Floreano, and F Antinucci. “Transitive choices by a simple, fully con- nected, backpropagation neural network: implications for the comparative study of transitive inference.” In:Animal Cognition4.1 (2001), pp. 61–68
work page 2001
-
[19]
A geometrical solution underlies general neural principle for serial ordering
Gabriele Di Antonio, Sofia Raglio, and Maurizio Mattia. “A geometrical solution underlies general neural principle for serial ordering”. In:Nature Communications15.1 (2024), p. 8238
work page 2024
-
[20]
Emergent neural dynamics and geometry for generalization in a transitive inference task
Kenneth Kay et al. “Emergent neural dynamics and geometry for generalization in a transitive inference task”. In:PLOS Computational Biology20.4 (2024), e1011954
work page 2024
-
[21]
A mathematical theory of relational generalization in transitive inference
Samuel Lippl et al. “A mathematical theory of relational generalization in transitive inference”. In:Proceedings of the National Academy of Sciences121.28 (2024), e2314511121
work page 2024
-
[22]
Relational reasoning and inductive bias in transformers and large language models
Jesse Geerts et al. “Relational reasoning and inductive bias in transformers trained on a transitive inference task”. In:arXiv preprint arXiv:2506.04289(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
ReCogLab: a framework testing relational reasoning & cognitive hypothe- ses on LLMs
Andrew Liu et al. “ReCogLab: a framework testing relational reasoning & cognitive hypothe- ses on LLMs”. In:The Thirteenth International Conference on Learning Representations. 2025
work page 2025
-
[24]
Some properties of configural learning: an investigation of the transverse-patterning problem
Maria C Alvarado and Jerry W Rudy. “Some properties of configural learning: an investigation of the transverse-patterning problem.” In:Journal of Experimental Psychology: Animal Behavior Processes18.2 (1992), p. 145
work page 1992
-
[25]
Configural learning in humans: The transverse patterning problem
Robert S Astur and Robert J Sutherland. “Configural learning in humans: The transverse patterning problem”. In:Psychobiology26.3 (1998), pp. 176–182
work page 1998
-
[26]
The hippocampus and transverse patterning guided by olfactory cues
Jeffery A Dusek and Howard Eichenbaum. “The hippocampus and transverse patterning guided by olfactory cues.” In:Behavioral neuroscience112.4 (1998), p. 762
work page 1998
-
[27]
CHESSFOX.Légal’s Mate. Accessed: 2026-05-20. n.d.URL: https://chessfox.com/ legals-mate/
work page 2026
-
[28]
Brenden Lake and Marco Baroni. “Generalization without systematicity: On the composi- tional skills of sequence-to-sequence recurrent networks”. In:International conference on machine learning. PMLR. 2018, pp. 2873–2882
work page 2018
-
[29]
Clevr: A diagnostic dataset for compositional language and elementary visual reasoning
Justin Johnson et al. “Clevr: A diagnostic dataset for compositional language and elementary visual reasoning”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 2901–2910
work page 2017
-
[30]
Towards a Formal Theory of Representational Compositionality
Eric Elmoznino et al. “Towards a Formal Theory of Representational Compositionality”. In:Forty-second International Conference on Machine Learning. 2025.URL: https:// openreview.net/forum?id=fXCfT7ErvL
work page 2025
-
[31]
Break it down: Evidence for structural compositionality in neural networks
Michael Lepori, Thomas Serre, and Ellie Pavlick. “Break it down: Evidence for structural compositionality in neural networks”. In:Advances in Neural Information Processing Systems 36 (2023), pp. 42623–42660
work page 2023
-
[32]
Discovering modular solutions that generalize compositionally
Simon Schug et al. “Discovering modular solutions that generalize compositionally”. In: The Twelfth International Conference on Learning Representations. 2024.URL: https : //openreview.net/forum?id=H98CVcX1eh
work page 2024
-
[33]
Compositional generalization from first principles
Thaddäus Wiedemer et al. “Compositional generalization from first principles”. In:Advances in Neural Information Processing Systems36 (2023), pp. 6941–6960
work page 2023
-
[34]
Provable Compositional Generalization for Object-Centric Learn- ing
Thaddäus Wiedemer et al. “Provable Compositional Generalization for Object-Centric Learn- ing”. In:The Twelfth International Conference on Learning Representations. 2024.URL: https://openreview.net/forum?id=7VPTUWkiDQ
work page 2024
-
[35]
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
Jack Brady et al. “Interaction Asymmetry: A General Principle for Learning Composable Abstractions”. In:The Thirteenth International Conference on Learning Representations. 2025.URL:https://openreview.net/forum?id=cCl10IU836. 11
work page 2025
-
[36]
On The Specialization of Neural Modules
Devon Jarvis et al. “On The Specialization of Neural Modules”. In:The Eleventh International Conference on Learning Representations. 2023.URL: https://openreview.net/forum? id=Fh97BDaR6I
work page 2023
-
[37]
James CR Whittington et al. “The Tolman-Eichenbaum machine: unifying space and rela- tional memory through generalization in the hippocampal formation”. In:Cell183.5 (2020), pp. 1249–1263
work page 2020
-
[38]
The relational bottleneck as an inductive bias for efficient abstraction
Taylor W Webb et al. “The relational bottleneck as an inductive bias for efficient abstraction”. In:Trends in Cognitive Sciences28.9 (2024), pp. 829–843
work page 2024
-
[39]
Transitive inference in non-human animals: An empirical and theoretical analysis
Marco Vasconcelos. “Transitive inference in non-human animals: An empirical and theoretical analysis”. In:Behavioural Processes78.3 (2008), pp. 313–334
work page 2008
- [40]
-
[41]
On the paradox of three random variables
Stanisław Trybuła. “On the paradox of three random variables”. In:Applicationes Mathemati- cae5.4 (1961), pp. 321–332
work page 1961
-
[42]
How vicious are cycles of intransitive choice?
Maya Bar-Hillel and Avishai Margalit. “How vicious are cycles of intransitive choice?” In: Theory and decision24.2 (1988), pp. 119–145
work page 1988
-
[43]
Santiago Soliveres and Eric Allan.Everything you always wanted to know about intransitive competition but were afraid to ask. 2018
work page 2018
-
[44]
Intransitivity in theory and in the real world
Alexander Y Klimenko. “Intransitivity in theory and in the real world”. In:Entropy17.6 (2015), pp. 4364–4412
work page 2015
-
[45]
Brian Conrey et al. “Intransitive dice”. In:Mathematics Magazine89.2 (2016), pp. 133–143
work page 2016
-
[46]
A difficulty in the concept of social welfare
Kenneth J Arrow. “A difficulty in the concept of social welfare”. In:Journal of political economy58.4 (1950), pp. 328–346
work page 1950
-
[47]
Information aggregation, rationality, and the Condorcet jury theorem
David Austen-Smith and Jeffrey S Banks. “Information aggregation, rationality, and the Condorcet jury theorem”. In:American political science review90.1 (1996), pp. 34–45
work page 1996
-
[48]
Laurent Bartholdi and Roman Mikhailov. “The topology of poker”. In:Games and Economic Behavior(2025)
work page 2025
-
[49]
Neural tangent kernel: Convergence and generalization in neural networks
Arthur Jacot, Franck Gabriel, and Clément Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In:Advances in neural information processing systems31 (2018)
work page 2018
-
[50]
On lazy training in differentiable programming
Lenaic Chizat, Edouard Oyallon, and Francis Bach. “On lazy training in differentiable programming”. In:Advances in neural information processing systems32 (2019)
work page 2019
-
[51]
Out-of-distribution generaliza- tion in kernel regression
Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. “Out-of-distribution generaliza- tion in kernel regression”. In:Advances in Neural Information Processing Systems34 (2021), pp. 12600–12612
work page 2021
-
[52]
Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. “Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks”. In:Nature communications12.1 (2021), p. 2914
work page 2021
-
[53]
Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting
Neil Mallinar et al. “Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting”. In:Advances in neural information processing systems35 (2022), pp. 1182– 1195
work page 2022
-
[54]
An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression
Lijia Zhou et al. “An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression”. In:The Twelfth International Conference on Learning Representations. 2024.URL: https: //openreview.net/forum?id=YrTI2Zu0dd
work page 2024
-
[55]
Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
Dhruva Karkada et al. “Predicting Kernel Regression Learning Curves from Only Raw Data Statistics”. In:The Fourteenth International Conference on Learning Representations. 2026. URL:https://openreview.net/forum?id=nn5Vf6GEsV
work page 2026
-
[56]
Generalization on the unseen, logic reasoning and degree curriculum
Emmanuel Abbe et al. “Generalization on the unseen, logic reasoning and degree curriculum”. In:Journal of Machine Learning Research25.331 (2024), pp. 1–58
work page 2024
-
[57]
When does compositional structure yield compositional generalization? A kernel theory
Samuel Lippl and Kim Stachenfeld. “When does compositional structure yield compositional generalization? A kernel theory.” In:The Thirteenth International Conference on Learning Representations. 2025.URL:https://openreview.net/forum?id=FPBce2P1er
work page 2025
-
[58]
A kernel-based view of language model fine-tuning
Sadhika Malladi et al. “A kernel-based view of language model fine-tuning”. In:International Conference on Machine Learning. PMLR. 2023, pp. 23610–23641
work page 2023
-
[59]
Linearization Explains Fine-Tuning in Large Language Models
Zahra Rahimi Afzal et al. “Linearization Explains Fine-Tuning in Large Language Models”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2026. URL:https://openreview.net/forum?id=tdwRIP6NG2. 12
work page 2026
-
[60]
Optimal Regularization can Mitigate Double Descent
Preetum Nakkiran et al. “Optimal Regularization can Mitigate Double Descent”. In:Interna- tional Conference on Learning Representations. 2021.URL: https://openreview.net/ forum?id=7R7fAoUygoa
work page 2021
-
[61]
Song Mei and Andrea Montanari. “The generalization error of random features regression: Precise asymptotics and the double descent curve”. In:Communications on Pure and Applied Mathematics75.4 (2022), pp. 667–766
work page 2022
-
[62]
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick et al. “Overcoming catastrophic forgetting in neural networks”. In:Pro- ceedings of the national academy of sciences114.13 (2017), pp. 3521–3526
work page 2017
-
[63]
Revisiting catastrophic forgetting in large language model tuning
Hongyu Li et al. “Revisiting catastrophic forgetting in large language model tuning”. In: Findings of the association for computational linguistics: EMNLP 2024. 2024, pp. 4297– 4308
work page 2024
-
[64]
Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
Louis Béthune et al. “Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection”. In:Forty-second International Conference on Machine Learning. 2025.URL: https://openreview.net/forum?id=vWMij23BmQ
work page 2025
-
[65]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J Hu et al. “LoRA: Low-Rank Adaptation of Large Language Models”. In:Interna- tional Conference on Learning Representations. 2022.URL: https://openreview.net/ forum?id=nZeVKeeFYf9
work page 2022
-
[66]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[67]
Qwen Team.Qwen3.5: Towards Native Multimodal Agents. Feb. 2026.URL: https://qwen. ai/blog?id=qwen3.5
work page 2026
-
[68]
On the conflict between logic and belief in syllogistic reasoning
J St BT Evans, Julie L Barston, and Paul Pollard. “On the conflict between logic and belief in syllogistic reasoning”. In:Memory & cognition11.3 (1983), pp. 295–306
work page 1983
-
[69]
Belief bias in children’s reasoning
Jonathan St BT Evans and Tania S Perry. “Belief bias in children’s reasoning.” In:Cahiers de Psychologie Cognitive/Current Psychology of Cognition(1995)
work page 1995
-
[70]
Language models show human-like content effects on reasoning tasks
Ishita Dasgupta et al. “Language models show human-like content effects on reasoning tasks”. In:arXiv preprint arXiv:2207.07051(2022)
-
[71]
Language models, like humans, show content effects on reasoning tasks
Andrew K Lampinen et al. “Language models, like humans, show content effects on reasoning tasks”. In:PNAS nexus3.7 (2024), pgae233
work page 2024
-
[72]
Transitive Inference in Large Language Models and Prompt- ing Intervention
Wenya Wu and Weihong Deng. “Transitive Inference in Large Language Models and Prompt- ing Intervention”. In:ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2025, pp. 1–5
work page 2025
-
[73]
LoRA Learns Less and Forgets Less
Dan Biderman et al. “LoRA Learns Less and Forgets Less”. In:Transactions on Ma- chine Learning Research(2024). Featured Certification.ISSN: 2835-8856.URL: https : //openreview.net/forum?id=aloEru2qCG
work page 2024
-
[74]
Anna C Schapiro et al. “Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning”. In: Philosophical Transactions of the Royal Society B: Biological Sciences372.1711 (2017)
work page 2017
-
[75]
Human-like systematic generalization through a meta- learning neural network
Brenden M Lake and Marco Baroni. “Human-like systematic generalization through a meta- learning neural network”. In:Nature623.7985 (2023), pp. 115–121
work page 2023
-
[76]
Exact learning dynamics of deep linear networks with prior knowledge
Clémentine C J Dominé et al. “Exact learning dynamics of deep linear networks with prior knowledge”. In:Journal of Statistical Mechanics: Theory and Experiment2023.11 (2023), p. 114004
work page 2023
-
[77]
Measuring and narrowing the compositionality gap in language models
Ofir Press et al. “Measuring and narrowing the compositionality gap in language models”. In: Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, pp. 5687– 5711
work page 2023
-
[78]
The Reversal Curse: LLMs trained on “A is B
Lukas Berglund et al. “The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A””. In:The Twelfth International Conference on Learning Representations. 2024.URL: https://openreview.net/forum?id=GPKTIktA0k
work page 2024
-
[79]
Adaptive compositional continual meta-learning
Bin Wu et al. “Adaptive compositional continual meta-learning”. In:International Conference on Machine Learning. PMLR. 2023, pp. 37358–37378
work page 2023
-
[80]
Scaling can lead to compositional generalization
Florian Redhardt, Yassir Akram, and Simon Schug. “Scaling can lead to compositional generalization”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2025.URL:https://openreview.net/forum?id=hZt0daVIZi. 13
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.