A mathematical theory of balancing relational generalization and memorization

Luke Cheng; Samuel Lippl

arxiv: 2605.22972 · v1 · pith:3PZZG5Q4new · submitted 2026-05-21 · 💻 cs.LG · cs.AI

A mathematical theory of balancing relational generalization and memorization

Luke Cheng , Samuel Lippl This is my paper

Pith reviewed 2026-05-25 05:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords transitive inferencerelational generalizationexception memorizationkernel ridge regressionrepresentational geometrylanguage models

0 comments

The pith

Kernel ridge regression balances transitive inference and exception memorization only under specific representational geometries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the transitive inference with exceptions task to study how systems learn a general relational rule while also memorizing a single violation of that rule. It provides an analytical characterization of kernel ridge regression solutions over a family of input representations and task parameters. The analysis shows these models can achieve the desired balance, yet success requires particular geometric properties of the representations that are not needed in the exception-free case. The same pattern of generalization and errors appears when the theory is tested by finetuning pretrained language models on ordered relations.

Core claim

Kernel ridge regression models can solve transitive inference while correctly handling one exception provided the representational geometry separates the general rule from the exception in kernel space; the same models fail for other geometries even when the task parameters are held fixed.

What carries the argument

Analytical solution of kernel ridge regression on embeddings of ordered tuples that include one explicit exception to the transitive rule.

If this is right

Generalization on the task requires geometries in which the exception does not produce destructive interference with the transitive kernel structure.
Pretrained language models finetuned on the task will exhibit both rule-consistent generalization and the systematic mistakes that follow from the geometry analysis.
The presence of even one exception makes the problem mechanically stricter than standard transitive inference because geometry must now be tuned to protect both the rule and the exception.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same geometry sensitivity is likely to appear in other relational tasks that combine a dominant rule with isolated counterexamples.
Controlling embedding geometry during pretraining or fine-tuning could be used to improve exception handling without sacrificing rule generalization.
Direct tests on transformer architectures rather than kernel proxies would clarify whether the predicted geometry dependence survives in modern networks.

Load-bearing premise

Kernel ridge regression behavior is representative of how neural networks learn on this relational task.

What would settle it

If language models finetuned on ordered relations with one exception neither generalize according to the transitive rule nor produce the specific error pattern predicted by the kernel analysis, the claimed link between geometry and performance would not hold.

Figures

Figures reproduced from arXiv: 2605.22972 by Luke Cheng, Samuel Lippl.

**Figure 2.** Figure 2: Transitive inference with exceptions. Illustration showing the relevant training and test sets as well as the associated relation. White squares reflect item pairs without an expected generalization. 1. Memorization of transitive pairs: Does the model learn the training pairs where at least one item is in O(i) ? 2. Memorization of intransitive pairs: Does the model learn the training pairs where both items… view at source ↗

**Figure 3.** Figure 3: Illustration of the theorem. A, The ranking system consists of the original rank from TI, r TI, plus a perturbation r pert (example here uses α = 0.2). B, Example ranking systems (for n = 9, p = 6, q = 4, c˜ → ∞). C, Any exchangeable representation is equivalent, via an orthonormal change of basis, to a four-hot representation where two units represent each input item xj and xk separately, one unit represe… view at source ↗

**Figure 4.** Figure 4: Behavior of the kernel model across task space. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Finetuning pretrained language models (PLMs) on relational data with exceptions. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Humans, animals, and modern machine learning models exhibit impressive abilities to learn complex behaviors and generalize these behaviors to unseen situations. This ability requires us to learn rules and regularities that allow for such generalizations. At the same time, in most complex environments, any rule will have its exceptions. How do learning systems balance between learning general regularities and memorizing exceptions? We argue that a lack of task paradigms has hindered the study of this essential ability. To address this gap, we introduce a novel task, transitive inference with exceptions, that tests for relational generalization and memorization of an exception to the relational rule. We then analytically characterize the behavior of a simple, theoretically tractable model of neural network learning (kernel ridge regression) across a broad family of representations and task parameters. We find that these models can balance between relational generalization and memorization, but unlike for transitive inference without an exception, successful generalization is sensitive to the specific representational geometry. We explain why this task is more challenging mechanistically by drawing on our analytical theory. Finally, we validate our theoretical insights in pretrained language models that are finetuned on ordered relations, finding that these models successfully generalize according to the transitive rule, but also make the kinds of systematic mistakes predicted by our theory. Overall, our theory shows how learning systems can balance between relational generalization and memorization, explains how this can go wrong, and emphasizes the need for new task paradigms designed to probe this ability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a clean new task for relational generalization with exceptions and derives explicit behavior for kernel ridge regression across geometries, but the results rest on a fixed-feature convex model whose predictions may not carry to networks that learn representations.

read the letter

The core contribution is the transitive inference with exceptions task plus the closed-form analysis of how kernel ridge regression handles the rule-plus-exception tradeoff. That analysis shows why an exception makes success depend on the specific geometry of the representation, while plain transitive inference does not. The derivation is explicit and the task is well-motivated as a missing paradigm. They also run a quick check on finetuned language models and report that the error patterns line up with the KRR predictions, which is a useful sanity check. That combination of new task and analytical treatment is the part worth paying attention to. The main limitation is the modeling choice itself. Kernel ridge regression works in a fixed feature space and solves a convex problem, so the geometry sensitivity it exhibits is derived under conditions that do not include feature adaptation or non-convex dynamics. The language-model experiments use pretrained models that are only finetuned, which does not test whether the same dependence appears once representations can change. If the geometry effect disappears or changes under gradient descent with learned features, the mechanistic story weakens. The paper is honest about using a tractable proxy, but the claim that the theory explains behavior in actual neural systems still needs more direct evidence. This is aimed at researchers working on relational reasoning, generalization bounds, or cognitive models of inference. It is coherent on its own terms and the math is reproducible in principle, so it deserves a serious referee even if the KRR-to-network gap requires discussion in revision.

Referee Report

2 major / 1 minor

Summary. The paper introduces the transitive inference with exceptions task to probe how learning systems balance relational generalization against memorization of exceptions. It provides an analytical characterization of kernel ridge regression (KRR) across a family of representations and task parameters, concluding that these models can achieve the balance but that successful generalization becomes sensitive to representational geometry precisely when an exception is present (unlike the exception-free case). The authors explain the mechanistic source of this sensitivity, then show that the predicted error patterns appear when pretrained language models are finetuned on ordered relations.

Significance. If the KRR analysis is internally sound and the geometry-sensitivity predictions are borne out beyond the convex fixed-feature setting, the work supplies a concrete mathematical account of the generalization-memorization trade-off on relational tasks and a falsifiable link to observed LM behavior. The explicit analytical treatment of KRR and the direct comparison to LM error patterns are strengths that would be valuable to the community.

major comments (2)

[Abstract / analytical characterization] Abstract (paragraph on analytical characterization) and the modeling section: the central claim that 'successful generalization is sensitive to the specific representational geometry' is derived under kernel ridge regression with fixed features. Because KRR solves a convex problem in a predetermined feature space, it cannot capture representation learning under gradient descent; the manuscript does not demonstrate that the same geometry dependence survives once features are allowed to adapt, which is required to underwrite the mechanistic explanation offered for language-model behavior.
[LM validation experiments] Validation section (LM experiments): the reported systematic mistakes in finetuned LMs are said to match the theory's predictions, yet the manuscript does not report controls that isolate representational geometry (e.g., by varying embedding dimensionality or kernel bandwidth while holding other factors fixed). Without such controls it remains unclear whether the observed errors are produced by the same mechanism identified in the KRR analysis.

minor comments (1)

[Task definition] Notation for the exception parameter and the geometry family should be introduced with a single consolidated table or figure early in the task-definition section to reduce cross-referencing.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed report. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract / analytical characterization] Abstract (paragraph on analytical characterization) and the modeling section: the central claim that 'successful generalization is sensitive to the specific representational geometry' is derived under kernel ridge regression with fixed features. Because KRR solves a convex problem in a predetermined feature space, it cannot capture representation learning under gradient descent; the manuscript does not demonstrate that the same geometry dependence survives once features are allowed to adapt, which is required to underwrite the mechanistic explanation offered for language-model behavior.

Authors: We agree that the analytical results are derived exclusively for kernel ridge regression with fixed features, which permits the closed-form characterization across representations and task parameters. The manuscript presents KRR as a tractable model of neural network learning but does not provide analysis or experiments showing that the reported geometry sensitivity persists when features adapt under gradient descent. We will revise the abstract and modeling section to explicitly state this scope limitation and add discussion of the implications for the mechanistic account of language-model behavior. revision: partial
Referee: [LM validation experiments] Validation section (LM experiments): the reported systematic mistakes in finetuned LMs are said to match the theory's predictions, yet the manuscript does not report controls that isolate representational geometry (e.g., by varying embedding dimensionality or kernel bandwidth while holding other factors fixed). Without such controls it remains unclear whether the observed errors are produced by the same mechanism identified in the KRR analysis.

Authors: We acknowledge that the LM experiments do not include explicit controls that isolate representational geometry while holding other factors fixed. The reported error patterns are consistent with the KRR predictions, but alternative mechanisms cannot be ruled out without additional controls. We will revise the validation section to discuss this limitation and, where feasible, incorporate supplementary analyses (e.g., varying model embedding dimensions or using controlled synthetic representations) to better link the observations to the KRR mechanism. revision: partial

standing simulated objections not resolved

Demonstrating that the geometry dependence survives under adaptive feature learning via gradient descent

Circularity Check

0 steps flagged

No circularity: analytical KRR characterization is self-contained

full rationale

The paper performs an analytical characterization of kernel ridge regression behavior on the transitive inference with exceptions task across representational geometries. This is a direct mathematical derivation from the KRR objective and kernel definitions rather than any fitting of parameters to target outputs followed by relabeling as prediction. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are referenced as load-bearing. The subsequent empirical validation on language models is presented as an independent check, not part of the core derivation chain. Absent any quoted equations that reduce the claimed results to their inputs by construction, the derivation chain does not exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; full text required to populate the ledger.

pith-pipeline@v0.9.0 · 5783 in / 1087 out tokens · 21049 ms · 2026-05-25T05:53:38.472174+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

120 extracted references · 120 canonical work pages · 3 internal anchors

[1]

The discovery of structural form

Charles Kemp and Joshua B Tenenbaum. “The discovery of structural form”. In:Proceedings of the National Academy of Sciences105.31 (2008), pp. 10687–10692

work page 2008
[2]

Building machines that learn and think like people

Brenden M Lake et al. “Building machines that learn and think like people”. In:Behavioral and brain sciences40 (2017), e253

work page 2017
[3]

Connectionism and cognitive architecture: A critical analysis

Jerry A Fodor and Zenon W Pylyshyn. “Connectionism and cognitive architecture: A critical analysis”. In:Cognition28.1-2 (1988), pp. 3–71

work page 1988
[4]

Compositionality decomposed: How do neural networks generalise?

Dieuwke Hupkes et al. “Compositionality decomposed: How do neural networks generalise?” In:Journal of Artificial Intelligence Research67 (2020), pp. 757–795

work page 2020
[5]

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

Daniel Keysers et al. “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”. In:International Conference on Learning Representations. 2020.URL: https://openreview.net/forum?id=SygcCnNKwr

work page 2020
[6]

Representation of real- world event schemas during narrative perception

Christopher Baldassano, Uri Hasson, and Kenneth A Norman. “Representation of real- world event schemas during narrative perception”. In:Journal of Neuroscience38.45 (2018), pp. 9689–9699

work page 2018
[7]

David E Rumelhart and James L Mcclelland.On Learning the Past Tenses of English Verbs. Tech. rep. 1985

work page 1985
[8]

Relational knowledge: The foundation of higher cognition

Graeme S Halford, William H Wilson, and Steven Phillips. “Relational knowledge: The foundation of higher cognition”. In:Trends in cognitive sciences14.11 (2010), pp. 497–505

work page 2010
[9]

Relational inductive biases, deep learning, and graph networks

Peter W Battaglia et al. “Relational inductive biases, deep learning, and graph networks”. In: arXiv preprint arXiv:1806.01261(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Judgment and reasoning in the child

J Piaget. “Judgment and reasoning in the child.” In: (1928)

work page 1928
[11]

Transitive inferences and memory in young children

Peter E Bryant and Thomas Trabasso. “Transitive inferences and memory in young children.” In:Nature(1971)

work page 1971
[12]

Asymmetric reinforcement learning facilitates human inference of transitive relations

Simon Ciranka et al. “Asymmetric reinforcement learning facilitates human inference of transitive relations”. In:Nature Human Behaviour6.4 (2022), pp. 555–564. 10

work page 2022
[13]

Neural knowledge assembly in humans and neural networks

Stephanie Nelli et al. “Neural knowledge assembly in humans and neural networks”. In: Neuron111.9 (2023), pp. 1504–1516

work page 2023
[14]

Are monkeys logical?

Brendan O McGonigle and Margaret Chalmers. “Are monkeys logical?” In:Nature267.5613 (1977), pp. 694–696

work page 1977
[15]

Transitive inference in rats (Rattus norvegicus)

Hank Davis. “Transitive inference in rats (Rattus norvegicus).” In:Journal of Comparative Psychology106.4 (1992), p. 342

work page 1992
[16]

Fish can infer social rank by observation alone

Logan Grosenick, Tricia S Clement, and Russell D Fernald. “Fish can infer social rank by observation alone”. In:Nature445.7126 (2007), pp. 429–432

work page 2007
[17]

Transitive inference in Polistes paper wasps

Elizabeth A Tibbetts et al. “Transitive inference in Polistes paper wasps”. In:Biology letters 15.5 (2019)

work page 2019
[18]

Transitive choices by a simple, fully con- nected, backpropagation neural network: implications for the comparative study of transitive inference

Carlo De Lillo, D Floreano, and F Antinucci. “Transitive choices by a simple, fully con- nected, backpropagation neural network: implications for the comparative study of transitive inference.” In:Animal Cognition4.1 (2001), pp. 61–68

work page 2001
[19]

A geometrical solution underlies general neural principle for serial ordering

Gabriele Di Antonio, Sofia Raglio, and Maurizio Mattia. “A geometrical solution underlies general neural principle for serial ordering”. In:Nature Communications15.1 (2024), p. 8238

work page 2024
[20]

Emergent neural dynamics and geometry for generalization in a transitive inference task

Kenneth Kay et al. “Emergent neural dynamics and geometry for generalization in a transitive inference task”. In:PLOS Computational Biology20.4 (2024), e1011954

work page 2024
[21]

A mathematical theory of relational generalization in transitive inference

Samuel Lippl et al. “A mathematical theory of relational generalization in transitive inference”. In:Proceedings of the National Academy of Sciences121.28 (2024), e2314511121

work page 2024
[22]

Relational reasoning and inductive bias in transformers and large language models

Jesse Geerts et al. “Relational reasoning and inductive bias in transformers trained on a transitive inference task”. In:arXiv preprint arXiv:2506.04289(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

ReCogLab: a framework testing relational reasoning & cognitive hypothe- ses on LLMs

Andrew Liu et al. “ReCogLab: a framework testing relational reasoning & cognitive hypothe- ses on LLMs”. In:The Thirteenth International Conference on Learning Representations. 2025

work page 2025
[24]

Some properties of configural learning: an investigation of the transverse-patterning problem

Maria C Alvarado and Jerry W Rudy. “Some properties of configural learning: an investigation of the transverse-patterning problem.” In:Journal of Experimental Psychology: Animal Behavior Processes18.2 (1992), p. 145

work page 1992
[25]

Configural learning in humans: The transverse patterning problem

Robert S Astur and Robert J Sutherland. “Configural learning in humans: The transverse patterning problem”. In:Psychobiology26.3 (1998), pp. 176–182

work page 1998
[26]

The hippocampus and transverse patterning guided by olfactory cues

Jeffery A Dusek and Howard Eichenbaum. “The hippocampus and transverse patterning guided by olfactory cues.” In:Behavioral neuroscience112.4 (1998), p. 762

work page 1998
[27]

Accessed: 2026-05-20

CHESSFOX.Légal’s Mate. Accessed: 2026-05-20. n.d.URL: https://chessfox.com/ legals-mate/

work page 2026
[28]

Generalization without systematicity: On the composi- tional skills of sequence-to-sequence recurrent networks

Brenden Lake and Marco Baroni. “Generalization without systematicity: On the composi- tional skills of sequence-to-sequence recurrent networks”. In:International conference on machine learning. PMLR. 2018, pp. 2873–2882

work page 2018
[29]

Clevr: A diagnostic dataset for compositional language and elementary visual reasoning

Justin Johnson et al. “Clevr: A diagnostic dataset for compositional language and elementary visual reasoning”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 2901–2910

work page 2017
[30]

Towards a Formal Theory of Representational Compositionality

Eric Elmoznino et al. “Towards a Formal Theory of Representational Compositionality”. In:Forty-second International Conference on Machine Learning. 2025.URL: https:// openreview.net/forum?id=fXCfT7ErvL

work page 2025
[31]

Break it down: Evidence for structural compositionality in neural networks

Michael Lepori, Thomas Serre, and Ellie Pavlick. “Break it down: Evidence for structural compositionality in neural networks”. In:Advances in Neural Information Processing Systems 36 (2023), pp. 42623–42660

work page 2023
[32]

Discovering modular solutions that generalize compositionally

Simon Schug et al. “Discovering modular solutions that generalize compositionally”. In: The Twelfth International Conference on Learning Representations. 2024.URL: https : //openreview.net/forum?id=H98CVcX1eh

work page 2024
[33]

Compositional generalization from first principles

Thaddäus Wiedemer et al. “Compositional generalization from first principles”. In:Advances in Neural Information Processing Systems36 (2023), pp. 6941–6960

work page 2023
[34]

Provable Compositional Generalization for Object-Centric Learn- ing

Thaddäus Wiedemer et al. “Provable Compositional Generalization for Object-Centric Learn- ing”. In:The Twelfth International Conference on Learning Representations. 2024.URL: https://openreview.net/forum?id=7VPTUWkiDQ

work page 2024
[35]

Interaction Asymmetry: A General Principle for Learning Composable Abstractions

Jack Brady et al. “Interaction Asymmetry: A General Principle for Learning Composable Abstractions”. In:The Thirteenth International Conference on Learning Representations. 2025.URL:https://openreview.net/forum?id=cCl10IU836. 11

work page 2025
[36]

On The Specialization of Neural Modules

Devon Jarvis et al. “On The Specialization of Neural Modules”. In:The Eleventh International Conference on Learning Representations. 2023.URL: https://openreview.net/forum? id=Fh97BDaR6I

work page 2023
[37]

The Tolman-Eichenbaum machine: unifying space and rela- tional memory through generalization in the hippocampal formation

James CR Whittington et al. “The Tolman-Eichenbaum machine: unifying space and rela- tional memory through generalization in the hippocampal formation”. In:Cell183.5 (2020), pp. 1249–1263

work page 2020
[38]

The relational bottleneck as an inductive bias for efficient abstraction

Taylor W Webb et al. “The relational bottleneck as an inductive bias for efficient abstraction”. In:Trends in Cognitive Sciences28.9 (2024), pp. 829–843

work page 2024
[39]

Transitive inference in non-human animals: An empirical and theoretical analysis

Marco Vasconcelos. “Transitive inference in non-human animals: An empirical and theoretical analysis”. In:Behavioural Processes78.3 (2008), pp. 313–334

work page 2008
[40]

Serial learning

Greg Jensen. “Serial learning.” In: (2017)

work page 2017
[41]

On the paradox of three random variables

Stanisław Trybuła. “On the paradox of three random variables”. In:Applicationes Mathemati- cae5.4 (1961), pp. 321–332

work page 1961
[42]

How vicious are cycles of intransitive choice?

Maya Bar-Hillel and Avishai Margalit. “How vicious are cycles of intransitive choice?” In: Theory and decision24.2 (1988), pp. 119–145

work page 1988
[43]

Santiago Soliveres and Eric Allan.Everything you always wanted to know about intransitive competition but were afraid to ask. 2018

work page 2018
[44]

Intransitivity in theory and in the real world

Alexander Y Klimenko. “Intransitivity in theory and in the real world”. In:Entropy17.6 (2015), pp. 4364–4412

work page 2015
[45]

Intransitive dice

Brian Conrey et al. “Intransitive dice”. In:Mathematics Magazine89.2 (2016), pp. 133–143

work page 2016
[46]

A difficulty in the concept of social welfare

Kenneth J Arrow. “A difficulty in the concept of social welfare”. In:Journal of political economy58.4 (1950), pp. 328–346

work page 1950
[47]

Information aggregation, rationality, and the Condorcet jury theorem

David Austen-Smith and Jeffrey S Banks. “Information aggregation, rationality, and the Condorcet jury theorem”. In:American political science review90.1 (1996), pp. 34–45

work page 1996
[48]

The topology of poker

Laurent Bartholdi and Roman Mikhailov. “The topology of poker”. In:Games and Economic Behavior(2025)

work page 2025
[49]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Clément Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In:Advances in neural information processing systems31 (2018)

work page 2018
[50]

On lazy training in differentiable programming

Lenaic Chizat, Edouard Oyallon, and Francis Bach. “On lazy training in differentiable programming”. In:Advances in neural information processing systems32 (2019)

work page 2019
[51]

Out-of-distribution generaliza- tion in kernel regression

Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. “Out-of-distribution generaliza- tion in kernel regression”. In:Advances in Neural Information Processing Systems34 (2021), pp. 12600–12612

work page 2021
[52]

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. “Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks”. In:Nature communications12.1 (2021), p. 2914

work page 2021
[53]

Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting

Neil Mallinar et al. “Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting”. In:Advances in neural information processing systems35 (2022), pp. 1182– 1195

work page 2022
[54]

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

Lijia Zhou et al. “An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression”. In:The Twelfth International Conference on Learning Representations. 2024.URL: https: //openreview.net/forum?id=YrTI2Zu0dd

work page 2024
[55]

Predicting Kernel Regression Learning Curves from Only Raw Data Statistics

Dhruva Karkada et al. “Predicting Kernel Regression Learning Curves from Only Raw Data Statistics”. In:The Fourteenth International Conference on Learning Representations. 2026. URL:https://openreview.net/forum?id=nn5Vf6GEsV

work page 2026
[56]

Generalization on the unseen, logic reasoning and degree curriculum

Emmanuel Abbe et al. “Generalization on the unseen, logic reasoning and degree curriculum”. In:Journal of Machine Learning Research25.331 (2024), pp. 1–58

work page 2024
[57]

When does compositional structure yield compositional generalization? A kernel theory

Samuel Lippl and Kim Stachenfeld. “When does compositional structure yield compositional generalization? A kernel theory.” In:The Thirteenth International Conference on Learning Representations. 2025.URL:https://openreview.net/forum?id=FPBce2P1er

work page 2025
[58]

A kernel-based view of language model fine-tuning

Sadhika Malladi et al. “A kernel-based view of language model fine-tuning”. In:International Conference on Machine Learning. PMLR. 2023, pp. 23610–23641

work page 2023
[59]

Linearization Explains Fine-Tuning in Large Language Models

Zahra Rahimi Afzal et al. “Linearization Explains Fine-Tuning in Large Language Models”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2026. URL:https://openreview.net/forum?id=tdwRIP6NG2. 12

work page 2026
[60]

Optimal Regularization can Mitigate Double Descent

Preetum Nakkiran et al. “Optimal Regularization can Mitigate Double Descent”. In:Interna- tional Conference on Learning Representations. 2021.URL: https://openreview.net/ forum?id=7R7fAoUygoa

work page 2021
[61]

The generalization error of random features regression: Precise asymptotics and the double descent curve

Song Mei and Andrea Montanari. “The generalization error of random features regression: Precise asymptotics and the double descent curve”. In:Communications on Pure and Applied Mathematics75.4 (2022), pp. 667–766

work page 2022
[62]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick et al. “Overcoming catastrophic forgetting in neural networks”. In:Pro- ceedings of the national academy of sciences114.13 (2017), pp. 3521–3526

work page 2017
[63]

Revisiting catastrophic forgetting in large language model tuning

Hongyu Li et al. “Revisiting catastrophic forgetting in large language model tuning”. In: Findings of the association for computational linguistics: EMNLP 2024. 2024, pp. 4297– 4308

work page 2024
[64]

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection

Louis Béthune et al. “Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection”. In:Forty-second International Conference on Machine Learning. 2025.URL: https://openreview.net/forum?id=vWMij23BmQ

work page 2025
[65]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J Hu et al. “LoRA: Low-Rank Adaptation of Large Language Models”. In:Interna- tional Conference on Learning Representations. 2022.URL: https://openreview.net/ forum?id=nZeVKeeFYf9

work page 2022
[66]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

Qwen Team.Qwen3.5: Towards Native Multimodal Agents. Feb. 2026.URL: https://qwen. ai/blog?id=qwen3.5

work page 2026
[68]

On the conflict between logic and belief in syllogistic reasoning

J St BT Evans, Julie L Barston, and Paul Pollard. “On the conflict between logic and belief in syllogistic reasoning”. In:Memory & cognition11.3 (1983), pp. 295–306

work page 1983
[69]

Belief bias in children’s reasoning

Jonathan St BT Evans and Tania S Perry. “Belief bias in children’s reasoning.” In:Cahiers de Psychologie Cognitive/Current Psychology of Cognition(1995)

work page 1995
[70]

Language models show human-like content effects on reasoning tasks

Ishita Dasgupta et al. “Language models show human-like content effects on reasoning tasks”. In:arXiv preprint arXiv:2207.07051(2022)

work page arXiv 2022
[71]

Language models, like humans, show content effects on reasoning tasks

Andrew K Lampinen et al. “Language models, like humans, show content effects on reasoning tasks”. In:PNAS nexus3.7 (2024), pgae233

work page 2024
[72]

Transitive Inference in Large Language Models and Prompt- ing Intervention

Wenya Wu and Weihong Deng. “Transitive Inference in Large Language Models and Prompt- ing Intervention”. In:ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2025, pp. 1–5

work page 2025
[73]

LoRA Learns Less and Forgets Less

Dan Biderman et al. “LoRA Learns Less and Forgets Less”. In:Transactions on Ma- chine Learning Research(2024). Featured Certification.ISSN: 2835-8856.URL: https : //openreview.net/forum?id=aloEru2qCG

work page 2024
[74]

Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning

Anna C Schapiro et al. “Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning”. In: Philosophical Transactions of the Royal Society B: Biological Sciences372.1711 (2017)

work page 2017
[75]

Human-like systematic generalization through a meta- learning neural network

Brenden M Lake and Marco Baroni. “Human-like systematic generalization through a meta- learning neural network”. In:Nature623.7985 (2023), pp. 115–121

work page 2023
[76]

Exact learning dynamics of deep linear networks with prior knowledge

Clémentine C J Dominé et al. “Exact learning dynamics of deep linear networks with prior knowledge”. In:Journal of Statistical Mechanics: Theory and Experiment2023.11 (2023), p. 114004

work page 2023
[77]

Measuring and narrowing the compositionality gap in language models

Ofir Press et al. “Measuring and narrowing the compositionality gap in language models”. In: Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, pp. 5687– 5711

work page 2023
[78]

The Reversal Curse: LLMs trained on “A is B

Lukas Berglund et al. “The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A””. In:The Twelfth International Conference on Learning Representations. 2024.URL: https://openreview.net/forum?id=GPKTIktA0k

work page 2024
[79]

Adaptive compositional continual meta-learning

Bin Wu et al. “Adaptive compositional continual meta-learning”. In:International Conference on Machine Learning. PMLR. 2023, pp. 37358–37378

work page 2023
[80]

Scaling can lead to compositional generalization

Florian Redhardt, Yassir Akram, and Simon Schug. “Scaling can lead to compositional generalization”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2025.URL:https://openreview.net/forum?id=hZt0daVIZi. 13

work page 2025

Showing first 80 references.

[1] [1]

The discovery of structural form

Charles Kemp and Joshua B Tenenbaum. “The discovery of structural form”. In:Proceedings of the National Academy of Sciences105.31 (2008), pp. 10687–10692

work page 2008

[2] [2]

Building machines that learn and think like people

Brenden M Lake et al. “Building machines that learn and think like people”. In:Behavioral and brain sciences40 (2017), e253

work page 2017

[3] [3]

Connectionism and cognitive architecture: A critical analysis

Jerry A Fodor and Zenon W Pylyshyn. “Connectionism and cognitive architecture: A critical analysis”. In:Cognition28.1-2 (1988), pp. 3–71

work page 1988

[4] [4]

Compositionality decomposed: How do neural networks generalise?

Dieuwke Hupkes et al. “Compositionality decomposed: How do neural networks generalise?” In:Journal of Artificial Intelligence Research67 (2020), pp. 757–795

work page 2020

[5] [5]

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

Daniel Keysers et al. “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”. In:International Conference on Learning Representations. 2020.URL: https://openreview.net/forum?id=SygcCnNKwr

work page 2020

[6] [6]

Representation of real- world event schemas during narrative perception

Christopher Baldassano, Uri Hasson, and Kenneth A Norman. “Representation of real- world event schemas during narrative perception”. In:Journal of Neuroscience38.45 (2018), pp. 9689–9699

work page 2018

[7] [7]

David E Rumelhart and James L Mcclelland.On Learning the Past Tenses of English Verbs. Tech. rep. 1985

work page 1985

[8] [8]

Relational knowledge: The foundation of higher cognition

Graeme S Halford, William H Wilson, and Steven Phillips. “Relational knowledge: The foundation of higher cognition”. In:Trends in cognitive sciences14.11 (2010), pp. 497–505

work page 2010

[9] [9]

Relational inductive biases, deep learning, and graph networks

Peter W Battaglia et al. “Relational inductive biases, deep learning, and graph networks”. In: arXiv preprint arXiv:1806.01261(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Judgment and reasoning in the child

J Piaget. “Judgment and reasoning in the child.” In: (1928)

work page 1928

[11] [11]

Transitive inferences and memory in young children

Peter E Bryant and Thomas Trabasso. “Transitive inferences and memory in young children.” In:Nature(1971)

work page 1971

[12] [12]

Asymmetric reinforcement learning facilitates human inference of transitive relations

Simon Ciranka et al. “Asymmetric reinforcement learning facilitates human inference of transitive relations”. In:Nature Human Behaviour6.4 (2022), pp. 555–564. 10

work page 2022

[13] [13]

Neural knowledge assembly in humans and neural networks

Stephanie Nelli et al. “Neural knowledge assembly in humans and neural networks”. In: Neuron111.9 (2023), pp. 1504–1516

work page 2023

[14] [14]

Are monkeys logical?

Brendan O McGonigle and Margaret Chalmers. “Are monkeys logical?” In:Nature267.5613 (1977), pp. 694–696

work page 1977

[15] [15]

Transitive inference in rats (Rattus norvegicus)

Hank Davis. “Transitive inference in rats (Rattus norvegicus).” In:Journal of Comparative Psychology106.4 (1992), p. 342

work page 1992

[16] [16]

Fish can infer social rank by observation alone

Logan Grosenick, Tricia S Clement, and Russell D Fernald. “Fish can infer social rank by observation alone”. In:Nature445.7126 (2007), pp. 429–432

work page 2007

[17] [17]

Transitive inference in Polistes paper wasps

Elizabeth A Tibbetts et al. “Transitive inference in Polistes paper wasps”. In:Biology letters 15.5 (2019)

work page 2019

[18] [18]

Transitive choices by a simple, fully con- nected, backpropagation neural network: implications for the comparative study of transitive inference

Carlo De Lillo, D Floreano, and F Antinucci. “Transitive choices by a simple, fully con- nected, backpropagation neural network: implications for the comparative study of transitive inference.” In:Animal Cognition4.1 (2001), pp. 61–68

work page 2001

[19] [19]

A geometrical solution underlies general neural principle for serial ordering

Gabriele Di Antonio, Sofia Raglio, and Maurizio Mattia. “A geometrical solution underlies general neural principle for serial ordering”. In:Nature Communications15.1 (2024), p. 8238

work page 2024

[20] [20]

Emergent neural dynamics and geometry for generalization in a transitive inference task

Kenneth Kay et al. “Emergent neural dynamics and geometry for generalization in a transitive inference task”. In:PLOS Computational Biology20.4 (2024), e1011954

work page 2024

[21] [21]

A mathematical theory of relational generalization in transitive inference

Samuel Lippl et al. “A mathematical theory of relational generalization in transitive inference”. In:Proceedings of the National Academy of Sciences121.28 (2024), e2314511121

work page 2024

[22] [22]

Relational reasoning and inductive bias in transformers and large language models

Jesse Geerts et al. “Relational reasoning and inductive bias in transformers trained on a transitive inference task”. In:arXiv preprint arXiv:2506.04289(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

ReCogLab: a framework testing relational reasoning & cognitive hypothe- ses on LLMs

Andrew Liu et al. “ReCogLab: a framework testing relational reasoning & cognitive hypothe- ses on LLMs”. In:The Thirteenth International Conference on Learning Representations. 2025

work page 2025

[24] [24]

Some properties of configural learning: an investigation of the transverse-patterning problem

Maria C Alvarado and Jerry W Rudy. “Some properties of configural learning: an investigation of the transverse-patterning problem.” In:Journal of Experimental Psychology: Animal Behavior Processes18.2 (1992), p. 145

work page 1992

[25] [25]

Configural learning in humans: The transverse patterning problem

Robert S Astur and Robert J Sutherland. “Configural learning in humans: The transverse patterning problem”. In:Psychobiology26.3 (1998), pp. 176–182

work page 1998

[26] [26]

The hippocampus and transverse patterning guided by olfactory cues

Jeffery A Dusek and Howard Eichenbaum. “The hippocampus and transverse patterning guided by olfactory cues.” In:Behavioral neuroscience112.4 (1998), p. 762

work page 1998

[27] [27]

Accessed: 2026-05-20

CHESSFOX.Légal’s Mate. Accessed: 2026-05-20. n.d.URL: https://chessfox.com/ legals-mate/

work page 2026

[28] [28]

Generalization without systematicity: On the composi- tional skills of sequence-to-sequence recurrent networks

Brenden Lake and Marco Baroni. “Generalization without systematicity: On the composi- tional skills of sequence-to-sequence recurrent networks”. In:International conference on machine learning. PMLR. 2018, pp. 2873–2882

work page 2018

[29] [29]

Clevr: A diagnostic dataset for compositional language and elementary visual reasoning

Justin Johnson et al. “Clevr: A diagnostic dataset for compositional language and elementary visual reasoning”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 2901–2910

work page 2017

[30] [30]

Towards a Formal Theory of Representational Compositionality

Eric Elmoznino et al. “Towards a Formal Theory of Representational Compositionality”. In:Forty-second International Conference on Machine Learning. 2025.URL: https:// openreview.net/forum?id=fXCfT7ErvL

work page 2025

[31] [31]

Break it down: Evidence for structural compositionality in neural networks

Michael Lepori, Thomas Serre, and Ellie Pavlick. “Break it down: Evidence for structural compositionality in neural networks”. In:Advances in Neural Information Processing Systems 36 (2023), pp. 42623–42660

work page 2023

[32] [32]

Discovering modular solutions that generalize compositionally

Simon Schug et al. “Discovering modular solutions that generalize compositionally”. In: The Twelfth International Conference on Learning Representations. 2024.URL: https : //openreview.net/forum?id=H98CVcX1eh

work page 2024

[33] [33]

Compositional generalization from first principles

Thaddäus Wiedemer et al. “Compositional generalization from first principles”. In:Advances in Neural Information Processing Systems36 (2023), pp. 6941–6960

work page 2023

[34] [34]

Provable Compositional Generalization for Object-Centric Learn- ing

Thaddäus Wiedemer et al. “Provable Compositional Generalization for Object-Centric Learn- ing”. In:The Twelfth International Conference on Learning Representations. 2024.URL: https://openreview.net/forum?id=7VPTUWkiDQ

work page 2024

[35] [35]

Interaction Asymmetry: A General Principle for Learning Composable Abstractions

Jack Brady et al. “Interaction Asymmetry: A General Principle for Learning Composable Abstractions”. In:The Thirteenth International Conference on Learning Representations. 2025.URL:https://openreview.net/forum?id=cCl10IU836. 11

work page 2025

[36] [36]

On The Specialization of Neural Modules

Devon Jarvis et al. “On The Specialization of Neural Modules”. In:The Eleventh International Conference on Learning Representations. 2023.URL: https://openreview.net/forum? id=Fh97BDaR6I

work page 2023

[37] [37]

The Tolman-Eichenbaum machine: unifying space and rela- tional memory through generalization in the hippocampal formation

James CR Whittington et al. “The Tolman-Eichenbaum machine: unifying space and rela- tional memory through generalization in the hippocampal formation”. In:Cell183.5 (2020), pp. 1249–1263

work page 2020

[38] [38]

The relational bottleneck as an inductive bias for efficient abstraction

Taylor W Webb et al. “The relational bottleneck as an inductive bias for efficient abstraction”. In:Trends in Cognitive Sciences28.9 (2024), pp. 829–843

work page 2024

[39] [39]

Transitive inference in non-human animals: An empirical and theoretical analysis

Marco Vasconcelos. “Transitive inference in non-human animals: An empirical and theoretical analysis”. In:Behavioural Processes78.3 (2008), pp. 313–334

work page 2008

[40] [40]

Serial learning

Greg Jensen. “Serial learning.” In: (2017)

work page 2017

[41] [41]

On the paradox of three random variables

Stanisław Trybuła. “On the paradox of three random variables”. In:Applicationes Mathemati- cae5.4 (1961), pp. 321–332

work page 1961

[42] [42]

How vicious are cycles of intransitive choice?

Maya Bar-Hillel and Avishai Margalit. “How vicious are cycles of intransitive choice?” In: Theory and decision24.2 (1988), pp. 119–145

work page 1988

[43] [43]

Santiago Soliveres and Eric Allan.Everything you always wanted to know about intransitive competition but were afraid to ask. 2018

work page 2018

[44] [44]

Intransitivity in theory and in the real world

Alexander Y Klimenko. “Intransitivity in theory and in the real world”. In:Entropy17.6 (2015), pp. 4364–4412

work page 2015

[45] [45]

Intransitive dice

Brian Conrey et al. “Intransitive dice”. In:Mathematics Magazine89.2 (2016), pp. 133–143

work page 2016

[46] [46]

A difficulty in the concept of social welfare

Kenneth J Arrow. “A difficulty in the concept of social welfare”. In:Journal of political economy58.4 (1950), pp. 328–346

work page 1950

[47] [47]

Information aggregation, rationality, and the Condorcet jury theorem

David Austen-Smith and Jeffrey S Banks. “Information aggregation, rationality, and the Condorcet jury theorem”. In:American political science review90.1 (1996), pp. 34–45

work page 1996

[48] [48]

The topology of poker

Laurent Bartholdi and Roman Mikhailov. “The topology of poker”. In:Games and Economic Behavior(2025)

work page 2025

[49] [49]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Clément Hongler. “Neural tangent kernel: Convergence and generalization in neural networks”. In:Advances in neural information processing systems31 (2018)

work page 2018

[50] [50]

On lazy training in differentiable programming

Lenaic Chizat, Edouard Oyallon, and Francis Bach. “On lazy training in differentiable programming”. In:Advances in neural information processing systems32 (2019)

work page 2019

[51] [51]

Out-of-distribution generaliza- tion in kernel regression

Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. “Out-of-distribution generaliza- tion in kernel regression”. In:Advances in Neural Information Processing Systems34 (2021), pp. 12600–12612

work page 2021

[52] [52]

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan. “Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks”. In:Nature communications12.1 (2021), p. 2914

work page 2021

[53] [53]

Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting

Neil Mallinar et al. “Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting”. In:Advances in neural information processing systems35 (2022), pp. 1182– 1195

work page 2022

[54] [54]

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

Lijia Zhou et al. “An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression”. In:The Twelfth International Conference on Learning Representations. 2024.URL: https: //openreview.net/forum?id=YrTI2Zu0dd

work page 2024

[55] [55]

Predicting Kernel Regression Learning Curves from Only Raw Data Statistics

Dhruva Karkada et al. “Predicting Kernel Regression Learning Curves from Only Raw Data Statistics”. In:The Fourteenth International Conference on Learning Representations. 2026. URL:https://openreview.net/forum?id=nn5Vf6GEsV

work page 2026

[56] [56]

Generalization on the unseen, logic reasoning and degree curriculum

Emmanuel Abbe et al. “Generalization on the unseen, logic reasoning and degree curriculum”. In:Journal of Machine Learning Research25.331 (2024), pp. 1–58

work page 2024

[57] [57]

When does compositional structure yield compositional generalization? A kernel theory

Samuel Lippl and Kim Stachenfeld. “When does compositional structure yield compositional generalization? A kernel theory.” In:The Thirteenth International Conference on Learning Representations. 2025.URL:https://openreview.net/forum?id=FPBce2P1er

work page 2025

[58] [58]

A kernel-based view of language model fine-tuning

Sadhika Malladi et al. “A kernel-based view of language model fine-tuning”. In:International Conference on Machine Learning. PMLR. 2023, pp. 23610–23641

work page 2023

[59] [59]

Linearization Explains Fine-Tuning in Large Language Models

Zahra Rahimi Afzal et al. “Linearization Explains Fine-Tuning in Large Language Models”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2026. URL:https://openreview.net/forum?id=tdwRIP6NG2. 12

work page 2026

[60] [60]

Optimal Regularization can Mitigate Double Descent

Preetum Nakkiran et al. “Optimal Regularization can Mitigate Double Descent”. In:Interna- tional Conference on Learning Representations. 2021.URL: https://openreview.net/ forum?id=7R7fAoUygoa

work page 2021

[61] [61]

The generalization error of random features regression: Precise asymptotics and the double descent curve

Song Mei and Andrea Montanari. “The generalization error of random features regression: Precise asymptotics and the double descent curve”. In:Communications on Pure and Applied Mathematics75.4 (2022), pp. 667–766

work page 2022

[62] [62]

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick et al. “Overcoming catastrophic forgetting in neural networks”. In:Pro- ceedings of the national academy of sciences114.13 (2017), pp. 3521–3526

work page 2017

[63] [63]

Revisiting catastrophic forgetting in large language model tuning

Hongyu Li et al. “Revisiting catastrophic forgetting in large language model tuning”. In: Findings of the association for computational linguistics: EMNLP 2024. 2024, pp. 4297– 4308

work page 2024

[64] [64]

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection

Louis Béthune et al. “Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection”. In:Forty-second International Conference on Machine Learning. 2025.URL: https://openreview.net/forum?id=vWMij23BmQ

work page 2025

[65] [65]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J Hu et al. “LoRA: Low-Rank Adaptation of Large Language Models”. In:Interna- tional Conference on Learning Representations. 2022.URL: https://openreview.net/ forum?id=nZeVKeeFYf9

work page 2022

[66] [66]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[67] [67]

Qwen Team.Qwen3.5: Towards Native Multimodal Agents. Feb. 2026.URL: https://qwen. ai/blog?id=qwen3.5

work page 2026

[68] [68]

On the conflict between logic and belief in syllogistic reasoning

J St BT Evans, Julie L Barston, and Paul Pollard. “On the conflict between logic and belief in syllogistic reasoning”. In:Memory & cognition11.3 (1983), pp. 295–306

work page 1983

[69] [69]

Belief bias in children’s reasoning

Jonathan St BT Evans and Tania S Perry. “Belief bias in children’s reasoning.” In:Cahiers de Psychologie Cognitive/Current Psychology of Cognition(1995)

work page 1995

[70] [70]

Language models show human-like content effects on reasoning tasks

Ishita Dasgupta et al. “Language models show human-like content effects on reasoning tasks”. In:arXiv preprint arXiv:2207.07051(2022)

work page arXiv 2022

[71] [71]

Language models, like humans, show content effects on reasoning tasks

Andrew K Lampinen et al. “Language models, like humans, show content effects on reasoning tasks”. In:PNAS nexus3.7 (2024), pgae233

work page 2024

[72] [72]

Transitive Inference in Large Language Models and Prompt- ing Intervention

Wenya Wu and Weihong Deng. “Transitive Inference in Large Language Models and Prompt- ing Intervention”. In:ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2025, pp. 1–5

work page 2025

[73] [73]

LoRA Learns Less and Forgets Less

Dan Biderman et al. “LoRA Learns Less and Forgets Less”. In:Transactions on Ma- chine Learning Research(2024). Featured Certification.ISSN: 2835-8856.URL: https : //openreview.net/forum?id=aloEru2qCG

work page 2024

[74] [74]

Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning

Anna C Schapiro et al. “Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning”. In: Philosophical Transactions of the Royal Society B: Biological Sciences372.1711 (2017)

work page 2017

[75] [75]

Human-like systematic generalization through a meta- learning neural network

Brenden M Lake and Marco Baroni. “Human-like systematic generalization through a meta- learning neural network”. In:Nature623.7985 (2023), pp. 115–121

work page 2023

[76] [76]

Exact learning dynamics of deep linear networks with prior knowledge

Clémentine C J Dominé et al. “Exact learning dynamics of deep linear networks with prior knowledge”. In:Journal of Statistical Mechanics: Theory and Experiment2023.11 (2023), p. 114004

work page 2023

[77] [77]

Measuring and narrowing the compositionality gap in language models

Ofir Press et al. “Measuring and narrowing the compositionality gap in language models”. In: Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, pp. 5687– 5711

work page 2023

[78] [78]

The Reversal Curse: LLMs trained on “A is B

Lukas Berglund et al. “The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A””. In:The Twelfth International Conference on Learning Representations. 2024.URL: https://openreview.net/forum?id=GPKTIktA0k

work page 2024

[79] [79]

Adaptive compositional continual meta-learning

Bin Wu et al. “Adaptive compositional continual meta-learning”. In:International Conference on Machine Learning. PMLR. 2023, pp. 37358–37378

work page 2023

[80] [80]

Scaling can lead to compositional generalization

Florian Redhardt, Yassir Akram, and Simon Schug. “Scaling can lead to compositional generalization”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2025.URL:https://openreview.net/forum?id=hZt0daVIZi. 13

work page 2025