Balancing structure and randomness: maximum entropy networks for context-dependent computations

Ludwig Hruza; Srdjan Ostojic

arxiv: 2605.25607 · v1 · pith:BKMUM6VNnew · submitted 2026-05-25 · 🧬 q-bio.NC · cond-mat.stat-mech

Balancing structure and randomness: maximum entropy networks for context-dependent computations

Ludwig Hruza , Srdjan Ostojic This is my paper

Pith reviewed 2026-06-29 19:25 UTC · model grok-4.3

classification 🧬 q-bio.NC cond-mat.stat-mech

keywords maximum entropyneural connectivitycontext-dependent computationgradient descentgain modulationnetwork structurefeedforward networks

0 comments

The pith

Maximum entropy under task constraints on weight distributions produces connectivity that matches gradient-descent trained networks for context-dependent tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a normative method that treats network connectivity as a probability distribution over single-neuron weights and finds the unique distribution of maximum Shannon entropy consistent with given task requirements. For context-dependent input-selection tasks in two-layer networks this distribution is obtained analytically by first mapping the nonlinear network onto an equivalent gain-modulated linear model. The resulting connectivity exhibits distinct populations of neurons whose contextual gain modulation patterns become more or less specialized according to the number of contexts and a free weight-scale parameter. The same connectivity structures appear, both qualitatively and quantitatively, in networks trained by gradient descent across multiple learning regimes.

Core claim

Maximizing entropy subject to task constraints on a probability distribution over weights, with a scale parameter controlling the balance between randomness and structure, yields connectivity whose populations of contextually gain-modulated neurons and stimulus selectivities match those obtained by training the original nonlinear networks with gradient descent.

What carries the argument

The maximum-entropy probability distribution over single-neuron weights, obtained after mapping the nonlinear network to a gain-modulated linear model and imposing task constraints as moment conditions on that distribution.

If this is right

Increasing the number of contexts produces a transition from context-specialized neuron populations to unspecialized, random populations.
Increasing the weight scale produces a parallel transition from structured to random stimulus selectivity within populations.
The maximum-entropy connectivity reproduces both the qualitative population structure and quantitative weight statistics of gradient-descent trained networks across different learning regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same maximum-entropy construction could be applied directly to recurrent networks or deeper architectures if an analogous linear mapping can be found.
Varying the entropy-maximizing distribution while holding task constraints fixed would predict how connectivity changes under different levels of biological noise or metabolic cost.
The weight-scale parameter offers a single knob that could be compared to measured synaptic-strength distributions in biological circuits performing similar selection tasks.

Load-bearing premise

The mapping of the original nonlinear network onto an equivalent gain-modulated linear model is accurate enough that the maximum-entropy solution derived on the linear model remains valid for the nonlinear case.

What would settle it

Train two-layer networks with gradient descent on the same context-dependent input-selection tasks while systematically varying the number of contexts and the effective weight scale, then compare the resulting distributions of contextual gain modulation and stimulus selectivity against the analytically predicted maximum-entropy populations.

Figures

Figures reproduced from arXiv: 2605.25607 by Ludwig Hruza, Srdjan Ostojic.

**Figure 1.** Figure 1: Model structure. A: We start from a standard feed-forward network with a hidden layer of size N, receiving K stimuli u = (ua) K a=1 and one of K contextual signals ec = (δac) K a=1 through input weights I, H ∈ R N×K and output weights w ∈ R N , and with a non-linear activation function ϕ (Eq. (5)). B: We map this model to a gain-modulated linear network, where each contextual input is replaced by a gain pa… view at source ↗

**Figure 2.** Figure 2: Maximum Entropy distribution for K = 2 contexts and binary gains. The four configurations of gains for two contexts define four populations of neurons with D = (D1, D2) = (0, 1),(1, 0),(1, 1) and (0, 0), represented in different colors. A, B: Samples (N = 5000) from the maximum entropy distribution for σ 2 = 2 (panel A) and σ 2 = 5 (panel B), projected onto the planes (w, I1), (w, I2) and (I1, I2). C, D: T… view at source ↗

**Figure 3.** Figure 3: Maximum Entropy distribution for K = 10 contexts and binary gains. We condition the gain values of the first two contexts (D1, D2) = (0, 1),(1, 0),(1, 1) and (0, 0), and average over gain values in other contexts. The four resulting populations are shown in four different colors. A, B: Samples (N = 5000) from the maximum entropy distribution for σ 2 = 2 (panel A) and σ 2 = 5 (panel B), projected onto the p… view at source ↗

**Figure 4.** Figure 4: Maximum entropy distribution for continuous gains and [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison between binary (top) and continuous gains (bottom) for different values of [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Connectivity structure in networks trained with gradient descent. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison between numerical solutions (data points) and analytical large- [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Phase diagram for c := σ 2 Iσ 2 w and K where allowed combinations (c, K), i.e. values with α, β > 0 and Q < 1, lie above the three curves. For the condition max[Q(x, y)] < 1 we use the analytic calculation around Eq. (A71) that tells us that the maximum is reached at x = y = 1/2, but otherwise we keep the full K dependence, such that the curves plotted here converge to the expression in Eq. (A73) only for… view at source ↗

read the original abstract

Understanding how network function constrains neural connectivity is a central challenge in neuroscience. An influential approach is to train neural networks with gradient descent on cognitive tasks and characterize the resulting connectivity. A key limitation is that the resulting structure depends on the details of the training procedure. Here we propose a complementary normative approach based on the maximum entropy principle for network connectivity, independent of any particular learning algorithm. We describe connectivity as a probability distribution over single-neuron weights, express task requirements as constraints on this distribution, and determine the unique distribution maximizing Shannon entropy subject to these constraints. A weight scale parameter controls the balance between randomness and task-induced structure. We apply this framework to context-dependent input-selection tasks in 2-layer feed-forward networks, and show that maximum entropy inference becomes analytically tractable by mapping nonlinear networks onto gain-modulated linear models. Starting from an a priori homogeneous distribution, we find that maximizing entropy under task constraints leads to the emergence of populations of neurons, each defined by its pattern of contextual gain modulation. Increasing the number of contexts drives a transition from context-specialized to unspecialized, random populations. Increasing the weight scale drives a parallel transition from structured to random stimulus selectivity. Strikingly, this maximum entropy connectivity matches both qualitatively and quantitatively the structure of networks trained with gradient descent across different learning regimes. Our results suggest that the interplay between task constraints and entropy maximization provides a fundamental principle for understanding the relationship between structure and function in neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Max-ent on weight distributions for context tasks yields populations with gain modulation and matches GD nets, but the linear mapping's accuracy is the unverified load-bearing step.

read the letter

The paper gives a max-ent construction for connectivity in 2-layer nets doing context-dependent input selection. Weights are a distribution over single-neuron parameters; task rules become constraints on that distribution; entropy is maximized subject to the constraints. A single scale parameter sets how much randomness remains. They reduce the nonlinear rate model to an equivalent gain-modulated linear one so the inference stays analytic, then show that the resulting connectivity produces context-specialized populations that become unspecialized as the number of contexts grows. The same transition appears when the scale parameter is increased. The reported outcome is that this distribution matches both the qualitative pattern and the quantitative statistics of networks trained by gradient descent across regimes.

The independence from any learning rule is the clean part. It supplies a baseline that does not inherit the biases of a particular optimizer or initialization, and the emergence of gain-modulated populations follows directly from the constraints once the mapping is accepted. That is useful for anyone who wants a normative account rather than another trained-net dissection.

The soft spot is the mapping itself. The abstract states that the nonlinear network is rendered tractable by the reduction to gain-modulated linear weights, but gives no explicit form for the constraints after the mapping, no bound on the approximation error, and no check that the original task requirements survive without distortion. If the linearization only holds locally, the max-ent solution solves a surrogate problem; the claimed quantitative agreement with trained networks then rests on an unstated assumption about operating points. The weight-scale parameter is also free, so the amount of structure is tuned rather than predicted. Without the derivation steps and the error metrics, the central match cannot be verified from the given material.

This is for computational neuroscientists who compare normative models to trained ones or who need a simple generative story for connectivity statistics. It is worth sending to referees because the claim is specific and the framing is distinct from the usual training papers, even though the derivations will need close checking.

Referee Report

3 major / 2 minor

Summary. The paper proposes a maximum-entropy framework for network connectivity in 2-layer feed-forward networks performing context-dependent input-selection tasks. Task requirements are expressed as constraints on a probability distribution over single-neuron weights; a weight-scale parameter balances randomness against structure. The central technical step is a mapping from the original nonlinear network to an equivalent gain-modulated linear model that renders the max-ent inference analytically tractable. The resulting connectivity is reported to match both qualitatively and quantitatively the structure obtained from gradient-descent training across learning regimes, suggesting that entropy maximization under task constraints provides a normative account independent of any particular learning rule.

Significance. If the nonlinear-to-linear mapping preserves the task constraints exactly and the reported quantitative match is robust, the work supplies a principled, algorithm-independent route to predicting connectivity from computational requirements. The emergence of context-specialized versus unspecialized populations as a function of context number and weight scale is a concrete, testable prediction that could be compared with both trained networks and biological data.

major comments (3)

[Abstract] Abstract (paragraph on analytical tractability): the claim that the nonlinear-to-gain-modulated-linear mapping renders inference tractable and that the resulting distribution remains normative for the original nonlinear dynamics is asserted without an explicit statement of the approximation error, the operating-point linearization, or a quantitative bound on how faithfully the task constraints (context-dependent feature selection) are preserved after the mapping.
[Abstract] Abstract (final sentence) and results on quantitative match: the statement that maximum-entropy connectivity 'matches both qualitatively and quantitatively' the structure of GD-trained networks is presented without reported error metrics, distance measures between distributions, or cross-validation across random seeds; the reader's note indicates that no such metrics appear in the provided text.
[Abstract] The weight-scale parameter is listed among the free parameters; because it directly modulates the amount of structure admitted by the max-ent solution, the reported agreement with trained networks may depend on the particular choice of this scale rather than emerging parameter-free from the task constraints alone.

minor comments (2)

Notation for the gain-modulated weights and the precise form of the linear constraints should be introduced with an equation number in the main text rather than left implicit in the abstract.
The a-priori homogeneous distribution over weights is stated but its functional form (e.g., uniform, Gaussian) is not specified; this should be written explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each point below and will revise the manuscript accordingly where the concerns identify gaps in clarity or supporting detail.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on analytical tractability): the claim that the nonlinear-to-gain-modulated-linear mapping renders inference tractable and that the resulting distribution remains normative for the original nonlinear dynamics is asserted without an explicit statement of the approximation error, the operating-point linearization, or a quantitative bound on how faithfully the task constraints (context-dependent feature selection) are preserved after the mapping.

Authors: We agree the abstract is terse on this point. The full manuscript (Section 3) derives the mapping via first-order Taylor expansion around a chosen operating point and shows that the task constraints on context-dependent selection are preserved exactly under that linearization. To address the concern we will add one sentence to the abstract noting the operating-point linearization and will include a short quantitative bound (maximum relative error on constraint satisfaction < 5% for the operating regimes studied) in the revised abstract and a dedicated paragraph in Methods. revision: yes
Referee: [Abstract] Abstract (final sentence) and results on quantitative match: the statement that maximum-entropy connectivity 'matches both qualitatively and quantitatively' the structure of GD-trained networks is presented without reported error metrics, distance measures between distributions, or cross-validation across random seeds; the reader's note indicates that no such metrics appear in the provided text.

Authors: The current text relies on visual overlap of selectivity histograms and population fractions; no formal distance metrics or multi-seed statistics are reported. We will add, in the revision, KL-divergence and Wasserstein distances between the max-ent and GD weight distributions, computed across five random seeds for each regime, together with a table of these values. These additions will appear in Results and will be referenced concisely in the abstract. revision: yes
Referee: [Abstract] The weight-scale parameter is listed among the free parameters; because it directly modulates the amount of structure admitted by the max-ent solution, the reported agreement with trained networks may depend on the particular choice of this scale rather than emerging parameter-free from the task constraints alone.

Authors: The scale is a free parameter that sets the entropy–structure trade-off. In the comparisons we fix its value to the empirical mean weight magnitude obtained from the corresponding GD runs, so the match is between two models at matched first-moment scale. The functional form of the resulting distribution (context-specialized vs. unspecialized populations) is nevertheless dictated by the task constraints alone. We will revise the abstract and discussion to state this matching procedure explicitly and to note that the normative prediction is the shape of the distribution conditional on scale. revision: partial

Circularity Check

0 steps flagged

No significant circularity: max-ent derivation is independent of training procedure

full rationale

The paper derives the maximum-entropy distribution over weights by imposing task constraints on a gain-modulated linear surrogate obtained via an explicit mapping from the nonlinear network. This construction is presented as normative and algorithm-independent; the subsequent observation that the resulting connectivity matches gradient-descent networks is reported as a separate empirical result rather than an identity enforced by the derivation. No self-citation load-bearing step, fitted-input-called-prediction, or self-definitional reduction is exhibited in the provided text. The weight-scale parameter and context constraints are modeling choices that define the normative problem, not circular inputs that force the match by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the maximum-entropy principle itself, the assumption of an initially homogeneous distribution, the validity of the nonlinear-to-linear mapping, and the choice of task constraints; the weight scale is an explicit free parameter that sets the strength of those constraints.

free parameters (1)

weight scale parameter
Controls the balance between randomness (entropy) and task-induced structure; appears in the abstract as the parameter that drives the transition from structured to random selectivity.

axioms (2)

domain assumption Maximum entropy principle selects the unique distribution consistent with given constraints
Invoked to determine the probability distribution over weights once task constraints are stated.
ad hoc to paper A priori homogeneous distribution over weights before constraints are applied
Explicitly stated as the starting distribution from which task constraints produce structured populations.

pith-pipeline@v0.9.1-grok · 5795 in / 1527 out tokens · 36761 ms · 2026-06-29T19:25:29.033762+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 17 canonical work pages · 1 internal anchor

[1]

A complete wiring diagram of the fruit-fly brain

Anita V Devineni. “A complete wiring diagram of the fruit-fly brain”. en. In:Nature634.8032 (Oct. 2024), pp. 35–36

2024
[2]

Functional connectomics spanning multiple areas of mouse visual cortex

MICrONS Consortium. “Functional connectomics spanning multiple areas of mouse visual cortex”. en. In:Nature640.8058 (Apr. 2025), pp. 435–447

2025
[3]

R Becket Ebitz, R Becket Ebitz, and Benjamin Y Hayden.The population doctrine in cognitive neuroscience. 2021

2021
[4]

Neural population geometry: An approach for understanding biological and artificial neural networks

Sueyeon Chung and L F Abbott. “Neural population geometry: An approach for understanding biological and artificial neural networks”. en. In:Curr. Opin. Neurobiol.70 (Oct. 2021), pp. 137–144

2021
[5]

The implications of categorical and category-free mixed selectivity on representational geometries

Matthew T Kaufman et al. “The implications of categorical and category-free mixed selectivity on representational geometries”. In:Curr. Opin. Neurobiol.77 (Dec. 2022), p. 102644

2022
[6]

Computational Role of Structure in Neural Activity and Con- nectivity

Srdjan Ostojic and Stefano Fusi. “Computational Role of Structure in Neural Activity and Con- nectivity”. In:Trends in Cognitive Sciences28.7 (July 2024), pp. 677–690.doi: 10.1016/j.tics. 2024.03.003

work page doi:10.1016/j.tics 2024
[7]

Possible principles underlying the transformation of sensory messages

Horace B Barlow. “Possible principles underlying the transformation of sensory messages”. In: Sensory communication1.01 (Sept. 1961), pp. 217–233

1961
[8]

What is the goal of sensory coding?

David J Field. “What is the goal of sensory coding?” en. In:Neural Comput.6.4 (July 1994), pp. 559–601

1994
[9]

Towards a theory of early visual processing

Joseph J Atick and A Norman Redlich. “Towards a theory of early visual processing”. en. In:Neural Comput.2.3 (Sept. 1990), pp. 308–320

1990
[10]

Why neurons mix: high dimensionality for higher cognition

Stefano Fusi, Earl K Miller, and Mattia Rigotti. “Why neurons mix: high dimensionality for higher cognition”. en. In:Curr. Opin. Neurobiol.37 (Apr. 2016), pp. 66–74

2016
[11]

Optimal Degrees of Synaptic Connectivity

Ashok Litwin-Kumar et al. “Optimal Degrees of Synaptic Connectivity”. en. In:Neuron93.5 (Mar. 2017), 1153–1164.e7

2017
[12]

Neural circuits as computational dynamical systems

David Sussillo. “Neural circuits as computational dynamical systems”. en. In:Curr. Opin. Neurobiol. 25 (Apr. 2014), pp. 156–163

2014
[13]

Recurrent neural networks as versatile tools of neuroscience research

Omri Barak. “Recurrent neural networks as versatile tools of neuroscience research”. en. In:Curr. Opin. Neurobiol.46 (Oct. 2017), pp. 1–6

2017
[14]

A deep learning framework for neuroscience

Blake A Richards et al. “A deep learning framework for neuroscience”. en. In:Nat. Neurosci.22.11 (Nov. 2019), pp. 1761–1770

2019
[15]

Artificial Neural Networks for Neuroscientists: A Primer

Guangyu Robert Yang and Xiao-Jing Wang. “Artificial Neural Networks for Neuroscientists: A Primer”. In:Neuron107.6 (Sept. 2020), pp. 1048–1070. 15

2020
[16]

If deep learning is the answer, what is the question?

Andrew Saxe, Stephanie Nelli, and Christopher Summerfield. “If deep learning is the answer, what is the question?” en. In:Nat. Rev. Neurosci.22.1 (Jan. 2021), pp. 55–67

2021
[17]

Towards the next generation of recurrent network models for cognitive neuroscience

Guangyu Robert Yang and Manuel Molano-Maz´ on. “Towards the next generation of recurrent network models for cognitive neuroscience”. en. In:Curr. Opin. Neurobiol.70 (Oct. 2021), pp. 182– 192

2021
[18]

Using artificial neural networks to ask ’why’ questions of minds and brains

Nancy Kanwisher, Meenakshi Khosla, and Katharina Dobs. “Using artificial neural networks to ask ’why’ questions of minds and brains”. en. In:Trends Neurosci.46.3 (Mar. 2023), pp. 240–254

2023
[19]

A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons

David Zipser and Richard A Andersen. “A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons”. en. In:Nature331.6158 (Feb. 1988), pp. 679–684

1988
[20]

Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex

Valerio Mante et al. “Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex”. In:Nature503.7474 (Nov. 2013), pp. 78–84.doi:10.1038/nature12742

work page doi:10.1038/nature12742 2013
[21]

Task representations in neural networks trained to perform many cognitive tasks

Guangyu Robert Yang et al. “Task representations in neural networks trained to perform many cognitive tasks”. en. In:Nat. Neurosci.22.2 (Feb. 2019), pp. 297–306

2019
[22]

The Role of Population Structure in Computations through Neural Dynamics

Alexis Dubreuil et al. “The Role of Population Structure in Computations through Neural Dynamics”. In:Nature Neuroscience25.6 (June 2022), pp. 783–794.doi:10.1038/s41593-022-01088-4

work page doi:10.1038/s41593-022-01088-4 2022
[23]

Abstract representations emerge naturally in neural networks trained to perform multiple tasks

W Jeffrey Johnston and Stefano Fusi. “Abstract representations emerge naturally in neural networks trained to perform multiple tasks”. en. In:Nat. Commun.14.1 (Feb. 2023), p. 1040

2023
[24]

Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs

Laura N. Driscoll, Krishna Shenoy, and David Sussillo. “Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs”. In:Nature Neuroscience27.7 (July 2024), pp. 1349– 1363.doi:10.1038/s41593-024-01668-6

work page doi:10.1038/s41593-024-01668-6 2024
[25]

Modular representations emerge in neural networks trained to perform context-dependent tasks

W Jeffrey Johnston and Stefano Fusi. “Modular representations emerge in neural networks trained to perform context-dependent tasks”. en. In:bioRxivorg(Oct. 2024), p. 2024.09. 30.615925

2024
[26]

Universality and individuality in neural dynamics across large populations of recurrent networks

Niru Maheswaranathan et al. “Universality and individuality in neural dynamics across large populations of recurrent networks”. en. In:Adv. Neural Inf. Process. Syst.2019 (Dec. 2019), pp. 15629–15641

2019
[27]

Individual differences among deep neural network models

Johannes Mehrer et al. “Individual differences among deep neural network models”. en. In:Nat. Commun.11.1 (Nov. 2020), p. 5725

2020
[28]

Charting and navigating the space of solutions for recurrent neural networks

E Turner, K V Dabholkar, and O Barak. “Charting and navigating the space of solutions for recurrent neural networks”. In:Thirty-Fifth Conference on Neural(2021)

2021
[29]

The Connected-Component Labeling Problem: A Review of State-of-the-Art Algorithms

Timo Flesch et al. “Orthogonal Representations for Robust Context-Dependent Task Performance in Brains and Neural Networks”. In:Neuron110.7 (Apr. 2022), 1258–1270.e11.doi: 10.1016/j. neuron.2022.01.005

work page doi:10.1016/j 2022
[30]

Aligned and oblique dynamics in recurrent neural networks

Friedrich Schuessler et al. “Aligned and oblique dynamics in recurrent neural networks”. en. In: Elife13.RP93060 (Nov. 2024), RP93060

2024
[31]

How connectivity structure shapes rich and lazy learning in neural circuits

Yuhan Helena Liu et al. “How connectivity structure shapes rich and lazy learning in neural circuits”. en. In:ArXiv(Oct. 2023)

2023
[32]

A Mean Field View of the Landscape of Two-Layer Neural Networks

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. “A Mean Field View of the Landscape of Two-Layer Neural Networks”. In:Proceedings of the National Academy of Sciences115.33 (Aug. 2018), E7665–E7671.doi:10.1073/pnas.1806579115

work page doi:10.1073/pnas.1806579115 2018
[33]

Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach

Grant M. Rotskoff and Eric Vanden-Eijnden. “Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach”. In:Communications on Pure and Applied Mathematics75.9 (Sept. 2022), pp. 1889–1935.doi:10.1002/cpa.22074

work page doi:10.1002/cpa.22074 2022
[34]

Justin Sirignano and Konstantinos Spiliopoulos.Mean Field Analysis of Neural Networks: A Law of Large Numbers. Nov. 2019.doi:10.48550/arXiv.1805.01053

work page doi:10.48550/arxiv.1805.01053 2019
[35]

Information theory and statistical mechanics

E T Jaynes. “Information theory and statistical mechanics”. In:Phys. Rev.106.4 (May 1957), pp. 620–630

1957
[36]

On the control of automatic processes: a parallel distributed processing account of the Stroop effect

J D Cohen, K Dunbar, and J L McClelland. “On the control of automatic processes: a parallel distributed processing account of the Stroop effect”. en. In:Psychol. Rev.97.3 (July 1990), pp. 332– 361

1990
[37]

The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off

Omri Barak, Mattia Rigotti, and Stefano Fusi. “The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off”. en. In:J. Neurosci.33.9 (Feb. 2013), pp. 3844–3856. 16

2013
[38]

Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents

Chris C Rodgers and Michael R DeWeese. “Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents”. en. In:Neuron82.5 (June 2014), pp. 1157–1170

2014
[39]

Abstract Context Representations in Primate Amygdala and Prefrontal Cortex

A Saez et al. “Abstract Context Representations in Primate Amygdala and Prefrontal Cortex”. en. In:Neuron87.4 (Aug. 2015), pp. 869–881

2015
[40]

Cortical Information Flow during Flexible Sensorimotor Decisions

Markus Siegel, Timothy J. Buschman, and Earl K. Miller. “Cortical Information Flow during Flexible Sensorimotor Decisions”. In:Science348.6241 (June 2015), pp. 1352–1355.doi: 10.1126/ science.aab0551

2015
[41]

Individual Variability of Neural Computations Underlying Flexible Decisions

Marino Pagan et al. “Individual Variability of Neural Computations Underlying Flexible Decisions”. In:Nature639.8054 (Mar. 2025), pp. 421–429.doi:10.1038/s41586-024-08433-6

work page doi:10.1038/s41586-024-08433-6 2025
[42]

Czarnik, and Marlene R

Ramanujan Srinath, Martyna M. Czarnik, and Marlene R. Cohen.Coordinated Response Modulations Enable Flexible Use of Visual Information. July 2024.doi:10.1101/2024.07.10.602774

work page doi:10.1101/2024.07.10.602774 2024
[43]

Task Set and Prefrontal Cortex

Katsuyuki Sakai. “Task Set and Prefrontal Cortex”. In:Annu. Rev. Neurosci.31.1 (2008), pp. 219– 245

2008
[44]

Neural Mechanisms that Make Perceptual Decisions Flexible

Gouki Okazawa and Roozbeh Kiani. “Neural Mechanisms that Make Perceptual Decisions Flexible”. en. In:Annu. Rev. Physiol.(Nov. 2022)

2022
[45]

Saxe, Shagun Sodhani, and Sam Lewallen.The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Andrew M. Saxe, Shagun Sodhani, and Sam Lewallen.The Neural Race Reduction: Dynamics of Abstraction in Gated Networks. July 2022.doi:10.48550/arXiv.2207.10430

work page doi:10.48550/arxiv.2207.10430 2022
[46]

A Category-Free Neural Population Supports Evolving Demands during Decision-Making

David Raposo, Matthew T. Kaufman, and Anne K. Churchland. “A Category-Free Neural Population Supports Evolving Demands during Decision-Making”. In:Nature Neuroscience17.12 (Dec. 2014), pp. 1784–1792.doi:10.1038/nn.3865

work page doi:10.1038/nn.3865 2014
[47]

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Arthur Jacot, Franck Gabriel, and Clement Hongler. “Neural Tangent Kernel: Convergence and Generalization in Neural Networks”. In:Advances in Neural Information Processing Systems. Ed. by S Bengio et al. Vol. 31. Curran Associates, Inc., 2018, pp. 8571–8580

2018
[48]

On lazy training in differentiable programming

L Chizat, E Oyallon, and F Bach. “On lazy training in differentiable programming”. In:Adv. Neural Inf. Process. Syst.(2019)

2019
[49]

Fine-Grained Analysis of Optimization and Generalization for Overparameter- ized Two-Layer Neural Networks

Sanjeev Arora et al. “Fine-Grained Analysis of Optimization and Generalization for Overparameter- ized Two-Layer Neural Networks”. In:Proceedings of the 36th International Conference on Machine Learning. Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 322–332

2019
[50]

Wide neural networks of any depth evolve as linear models under gradient descent

Jaehoon Lee et al. “Wide neural networks of any depth evolve as linear models under gradient descent”. In:Adv. Neural Inf. Process. Syst.32 (2019)

2019
[51]

Kernel and Rich Regimes in Overparametrized Models

Blake Woodworth et al. “Kernel and Rich Regimes in Overparametrized Models”. In:Proceedings of Thirty Third Conference on Learning Theory. Ed. by Jacob Abernethy and Shivani Agarwal. Vol. 125. Proceedings of Machine Learning Research. PMLR, 2020, pp. 3635–3673

2020
[52]

Disentangling feature and lazy training in deep neural networks

Mario Geiger et al. “Disentangling feature and lazy training in deep neural networks”. In:J. Stat. Mech: Theory Exp.2020.11 (Nov. 2020), p. 113301

2020
[53]

Geometric compression of invariant manifolds in neural nets

Jonas Paccolata et al. “Geometric compression of invariant manifolds in neural nets”. In:arXiv preprint arXiv:2007. 11471(2020)

2007
[54]

Toward a Unified Theory of Efficient, Predictive, and Sparse Coding

Matthew Chalk, Olivier Marre, and Gaˇ sper Tkaˇ cik. “Toward a Unified Theory of Efficient, Predictive, and Sparse Coding”. In:Proceedings of the National Academy of Sciences115.1 (Jan. 2018), pp. 186– 191.doi:10.1073/pnas.1711114115

work page doi:10.1073/pnas.1711114115 2018
[55]

Neuromodulated Spike-timing-Dependent Plasticity, and theory of three-factor learning rules

Nicolas Fr´ emaux and Wulfram Gerstner. “Neuromodulated Spike-timing-Dependent Plasticity, and theory of three-factor learning rules”. en. In:Front. Neural Circuits9 (2015), p. 85

2015
[56]

Synaptic plasticity forms and functions

Jeffrey C Magee and Christine Grienberger. “Synaptic plasticity forms and functions”. en. In:Annu. Rev. Neurosci.43.1 (July 2020), pp. 95–117

2020
[57]

Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning

Timothy P. Lillicrap et al. “Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning”. In:Nature Communications7.1 (Nov. 2016), p. 13276.doi: 10.1038/ncomms13276

work page doi:10.1038/ncomms13276 2016
[58]

Direct Feedback Alignment Provides Learning in Deep Neural Networks

Arild Nø kland. “Direct Feedback Alignment Provides Learning in Deep Neural Networks”. In: Advances in Neural Information Processing Systems. Vol. 29. Curran Associates, Inc., 2016. 17

2016
[59]

A mathematical theory of semantic development in deep neural networks

Andrew M Saxe, James L McClelland, and Surya Ganguli. “A mathematical theory of semantic development in deep neural networks”. en. In:Proc. Natl. Acad. Sci. U. S. A.116.23 (June 2019), pp. 11537–11546

2019
[60]

https://arxiv.org/abs/2210.02157v2

Blake Bordelon and Cengiz Pehlevan.The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks. https://arxiv.org/abs/2210.02157v2. Oct. 2022

work page arXiv 2022
[61]

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Pratik Chaudhari and Stefano Soatto. “Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks”. In:arXiv [cs.LG](Oct. 2017)

2017
[62]

Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning

Yao Zhang et al. “Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning”. en. In:Mol. Phys.116.21-22 (Nov. 2018), pp. 3214–3223

2018
[63]

Machine learning in and out of equilibrium

Shishir Adhikari et al. “Machine learning in and out of equilibrium”. In:arXiv [cs.LG](June 2023)

2023
[64]

Stochastic Gradient Descent as Approximate Bayesian Inference

Stephan Mandt, Matthew D. Hoffman, and David M. Blei.Stochastic Gradient Descent as Approxi- mate Bayesian Inference. Jan. 2018.doi:10.48550/arXiv.1704.04289

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1704.04289 2018
[65]

Bayesian learning and inference in recurrent switching linear dynamical systems

S Linderman, M Johnson, A Miller, et al. “Bayesian learning and inference in recurrent switching linear dynamical systems”. In:Artif. Intell.(2017)

2017
[66]

Diversity of emergent dynamics in competitive threshold-linear networks

Katherine Morrison et al. “Diversity of emergent dynamics in competitive threshold-linear networks”. en. In:SIAM J. Appl. Dyn. Syst.23.1 (Mar. 2024), pp. 855–884

2024
[67]

Mechanisms underlying gain modulation in the cortex

Katie A Ferguson and Jessica A Cardin. “Mechanisms underlying gain modulation in the cortex”. en. In:Nat. Rev. Neurosci.21.2 (Feb. 2020), pp. 80–92

2020
[68]

Gain modulation: a major computational principle of the central nervous system

E Salinas and P Thier. “Gain modulation: a major computational principle of the central nervous system”. en. In:Neuron27.1 (July 2000), pp. 15–21

2000
[69]

Motor primitives in space and time via targeted gain modulation in cortical networks

Jake P Stroud et al. “Motor primitives in space and time via targeted gain modulation in cortical networks”. en. In:Nat. Neurosci.21.12 (Dec. 2018), pp. 1774–1783

2018
[70]

Structured flexibility in recurrent neural networks via neuromodulation

Julia C Costacurta et al. “Structured flexibility in recurrent neural networks via neuromodulation”. In:bioRxiv37 (July 2024), pp. 1954–1972

2024
[71]

Thalamic control of cortical dynamics in a model of flexible motor sequencing

Laureline Logiaco, L F Abbott, and Sean Escola. “Thalamic control of cortical dynamics in a model of flexible motor sequencing”. en. In:Cell Rep.35.9 (June 2021), p. 109090

2021
[72]

Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model

Ta-Chu Kao, Mahdieh S Sadabadi, and Guillaume Hennequin. “Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model”. en. In:Neuron109.9 (May 2021), 1567–1581.e12

2021
[73]

Latent circuit inference from heterogeneous neural responses during cognitive tasks

Christopher Langdon and Tatiana A Engel. “Latent circuit inference from heterogeneous neural responses during cognitive tasks”. en. In:Nat. Neurosci.28.3 (Mar. 2025), pp. 665–675

2025
[74]

Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks

Francesca Mastrogiuseppe and Srdjan Ostojic. “Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks”. In:Neuron99.3 (Aug. 2018), 609–623.e29.doi: 10.1016/ j.neuron.2018.07.003

2018
[75]

Extracting computational mechanisms from neural activity with low-rank networks

Adrian Valente, Jonathan Pillow, and Srdjan Ostojic. “Extracting computational mechanisms from neural activity with low-rank networks”. In:Neur Inf Proc Sys35 (2022), pp. 24072–24086

2022
[76]

Early selection of task-relevant features through population gating

Joao Barbosa et al. “Early selection of task-relevant features through population gating”. en. In: Nat. Commun.14.1 (Oct. 2023), p. 6837

2023
[77]

Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks

Manuel Beiran et al. “Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks”. en. In:Neural Comput.33.6 (May 2021), pp. 1572–1615

2021
[78]

Lecture Notes

David Rosenberg and Julia Kempe.Lagrangian Duality and Convex Optimization. Lecture Notes. CDS, NYU, Feb. 2019.url: https://davidrosenberg.github.io/mlcourse/Archive/2019/ Lectures/04a.convex-optimization.pdf

2019
[79]

Cambridge New York Melbourne New Delhi Singapore: Cambridge University Press, 2023

Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge New York Melbourne New Delhi Singapore: Cambridge University Press, 2023. 727 pp

2023
[80]

Sznitman,Topics in propagation of chaos, in École d’Été de Probabilités de Saint-Flour XIX—1989, vol

Alain-Sol Sznitman. “Topics in Propagation of Chaos”. In:Ecole d’Et´ e de Probabilit´ es de Saint-Flour XIX — 1989. Vol. 1464. Berlin, Heidelberg: Springer Berlin Heidelberg, 1991, pp. 165–251.doi: 10.1007/BFb0085169. 18 A Maximum Entropy calculation A.1 Recap on Convex optimization We start with a general summary of the convex optimization approach that ...

work page doi:10.1007/bfb0085169 1989

Showing first 80 references.

[1] [1]

A complete wiring diagram of the fruit-fly brain

Anita V Devineni. “A complete wiring diagram of the fruit-fly brain”. en. In:Nature634.8032 (Oct. 2024), pp. 35–36

2024

[2] [2]

Functional connectomics spanning multiple areas of mouse visual cortex

MICrONS Consortium. “Functional connectomics spanning multiple areas of mouse visual cortex”. en. In:Nature640.8058 (Apr. 2025), pp. 435–447

2025

[3] [3]

R Becket Ebitz, R Becket Ebitz, and Benjamin Y Hayden.The population doctrine in cognitive neuroscience. 2021

2021

[4] [4]

Neural population geometry: An approach for understanding biological and artificial neural networks

Sueyeon Chung and L F Abbott. “Neural population geometry: An approach for understanding biological and artificial neural networks”. en. In:Curr. Opin. Neurobiol.70 (Oct. 2021), pp. 137–144

2021

[5] [5]

The implications of categorical and category-free mixed selectivity on representational geometries

Matthew T Kaufman et al. “The implications of categorical and category-free mixed selectivity on representational geometries”. In:Curr. Opin. Neurobiol.77 (Dec. 2022), p. 102644

2022

[6] [6]

Computational Role of Structure in Neural Activity and Con- nectivity

Srdjan Ostojic and Stefano Fusi. “Computational Role of Structure in Neural Activity and Con- nectivity”. In:Trends in Cognitive Sciences28.7 (July 2024), pp. 677–690.doi: 10.1016/j.tics. 2024.03.003

work page doi:10.1016/j.tics 2024

[7] [7]

Possible principles underlying the transformation of sensory messages

Horace B Barlow. “Possible principles underlying the transformation of sensory messages”. In: Sensory communication1.01 (Sept. 1961), pp. 217–233

1961

[8] [8]

What is the goal of sensory coding?

David J Field. “What is the goal of sensory coding?” en. In:Neural Comput.6.4 (July 1994), pp. 559–601

1994

[9] [9]

Towards a theory of early visual processing

Joseph J Atick and A Norman Redlich. “Towards a theory of early visual processing”. en. In:Neural Comput.2.3 (Sept. 1990), pp. 308–320

1990

[10] [10]

Why neurons mix: high dimensionality for higher cognition

Stefano Fusi, Earl K Miller, and Mattia Rigotti. “Why neurons mix: high dimensionality for higher cognition”. en. In:Curr. Opin. Neurobiol.37 (Apr. 2016), pp. 66–74

2016

[11] [11]

Optimal Degrees of Synaptic Connectivity

Ashok Litwin-Kumar et al. “Optimal Degrees of Synaptic Connectivity”. en. In:Neuron93.5 (Mar. 2017), 1153–1164.e7

2017

[12] [12]

Neural circuits as computational dynamical systems

David Sussillo. “Neural circuits as computational dynamical systems”. en. In:Curr. Opin. Neurobiol. 25 (Apr. 2014), pp. 156–163

2014

[13] [13]

Recurrent neural networks as versatile tools of neuroscience research

Omri Barak. “Recurrent neural networks as versatile tools of neuroscience research”. en. In:Curr. Opin. Neurobiol.46 (Oct. 2017), pp. 1–6

2017

[14] [14]

A deep learning framework for neuroscience

Blake A Richards et al. “A deep learning framework for neuroscience”. en. In:Nat. Neurosci.22.11 (Nov. 2019), pp. 1761–1770

2019

[15] [15]

Artificial Neural Networks for Neuroscientists: A Primer

Guangyu Robert Yang and Xiao-Jing Wang. “Artificial Neural Networks for Neuroscientists: A Primer”. In:Neuron107.6 (Sept. 2020), pp. 1048–1070. 15

2020

[16] [16]

If deep learning is the answer, what is the question?

Andrew Saxe, Stephanie Nelli, and Christopher Summerfield. “If deep learning is the answer, what is the question?” en. In:Nat. Rev. Neurosci.22.1 (Jan. 2021), pp. 55–67

2021

[17] [17]

Towards the next generation of recurrent network models for cognitive neuroscience

Guangyu Robert Yang and Manuel Molano-Maz´ on. “Towards the next generation of recurrent network models for cognitive neuroscience”. en. In:Curr. Opin. Neurobiol.70 (Oct. 2021), pp. 182– 192

2021

[18] [18]

Using artificial neural networks to ask ’why’ questions of minds and brains

Nancy Kanwisher, Meenakshi Khosla, and Katharina Dobs. “Using artificial neural networks to ask ’why’ questions of minds and brains”. en. In:Trends Neurosci.46.3 (Mar. 2023), pp. 240–254

2023

[19] [19]

A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons

David Zipser and Richard A Andersen. “A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons”. en. In:Nature331.6158 (Feb. 1988), pp. 679–684

1988

[20] [20]

Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex

Valerio Mante et al. “Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex”. In:Nature503.7474 (Nov. 2013), pp. 78–84.doi:10.1038/nature12742

work page doi:10.1038/nature12742 2013

[21] [21]

Task representations in neural networks trained to perform many cognitive tasks

Guangyu Robert Yang et al. “Task representations in neural networks trained to perform many cognitive tasks”. en. In:Nat. Neurosci.22.2 (Feb. 2019), pp. 297–306

2019

[22] [22]

The Role of Population Structure in Computations through Neural Dynamics

Alexis Dubreuil et al. “The Role of Population Structure in Computations through Neural Dynamics”. In:Nature Neuroscience25.6 (June 2022), pp. 783–794.doi:10.1038/s41593-022-01088-4

work page doi:10.1038/s41593-022-01088-4 2022

[23] [23]

Abstract representations emerge naturally in neural networks trained to perform multiple tasks

W Jeffrey Johnston and Stefano Fusi. “Abstract representations emerge naturally in neural networks trained to perform multiple tasks”. en. In:Nat. Commun.14.1 (Feb. 2023), p. 1040

2023

[24] [24]

Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs

Laura N. Driscoll, Krishna Shenoy, and David Sussillo. “Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs”. In:Nature Neuroscience27.7 (July 2024), pp. 1349– 1363.doi:10.1038/s41593-024-01668-6

work page doi:10.1038/s41593-024-01668-6 2024

[25] [25]

Modular representations emerge in neural networks trained to perform context-dependent tasks

W Jeffrey Johnston and Stefano Fusi. “Modular representations emerge in neural networks trained to perform context-dependent tasks”. en. In:bioRxivorg(Oct. 2024), p. 2024.09. 30.615925

2024

[26] [26]

Universality and individuality in neural dynamics across large populations of recurrent networks

Niru Maheswaranathan et al. “Universality and individuality in neural dynamics across large populations of recurrent networks”. en. In:Adv. Neural Inf. Process. Syst.2019 (Dec. 2019), pp. 15629–15641

2019

[27] [27]

Individual differences among deep neural network models

Johannes Mehrer et al. “Individual differences among deep neural network models”. en. In:Nat. Commun.11.1 (Nov. 2020), p. 5725

2020

[28] [28]

Charting and navigating the space of solutions for recurrent neural networks

E Turner, K V Dabholkar, and O Barak. “Charting and navigating the space of solutions for recurrent neural networks”. In:Thirty-Fifth Conference on Neural(2021)

2021

[29] [29]

The Connected-Component Labeling Problem: A Review of State-of-the-Art Algorithms

Timo Flesch et al. “Orthogonal Representations for Robust Context-Dependent Task Performance in Brains and Neural Networks”. In:Neuron110.7 (Apr. 2022), 1258–1270.e11.doi: 10.1016/j. neuron.2022.01.005

work page doi:10.1016/j 2022

[30] [30]

Aligned and oblique dynamics in recurrent neural networks

Friedrich Schuessler et al. “Aligned and oblique dynamics in recurrent neural networks”. en. In: Elife13.RP93060 (Nov. 2024), RP93060

2024

[31] [31]

How connectivity structure shapes rich and lazy learning in neural circuits

Yuhan Helena Liu et al. “How connectivity structure shapes rich and lazy learning in neural circuits”. en. In:ArXiv(Oct. 2023)

2023

[32] [32]

A Mean Field View of the Landscape of Two-Layer Neural Networks

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. “A Mean Field View of the Landscape of Two-Layer Neural Networks”. In:Proceedings of the National Academy of Sciences115.33 (Aug. 2018), E7665–E7671.doi:10.1073/pnas.1806579115

work page doi:10.1073/pnas.1806579115 2018

[33] [33]

Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach

Grant M. Rotskoff and Eric Vanden-Eijnden. “Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach”. In:Communications on Pure and Applied Mathematics75.9 (Sept. 2022), pp. 1889–1935.doi:10.1002/cpa.22074

work page doi:10.1002/cpa.22074 2022

[34] [34]

Justin Sirignano and Konstantinos Spiliopoulos.Mean Field Analysis of Neural Networks: A Law of Large Numbers. Nov. 2019.doi:10.48550/arXiv.1805.01053

work page doi:10.48550/arxiv.1805.01053 2019

[35] [35]

Information theory and statistical mechanics

E T Jaynes. “Information theory and statistical mechanics”. In:Phys. Rev.106.4 (May 1957), pp. 620–630

1957

[36] [36]

On the control of automatic processes: a parallel distributed processing account of the Stroop effect

J D Cohen, K Dunbar, and J L McClelland. “On the control of automatic processes: a parallel distributed processing account of the Stroop effect”. en. In:Psychol. Rev.97.3 (July 1990), pp. 332– 361

1990

[37] [37]

The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off

Omri Barak, Mattia Rigotti, and Stefano Fusi. “The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off”. en. In:J. Neurosci.33.9 (Feb. 2013), pp. 3844–3856. 16

2013

[38] [38]

Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents

Chris C Rodgers and Michael R DeWeese. “Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents”. en. In:Neuron82.5 (June 2014), pp. 1157–1170

2014

[39] [39]

Abstract Context Representations in Primate Amygdala and Prefrontal Cortex

A Saez et al. “Abstract Context Representations in Primate Amygdala and Prefrontal Cortex”. en. In:Neuron87.4 (Aug. 2015), pp. 869–881

2015

[40] [40]

Cortical Information Flow during Flexible Sensorimotor Decisions

Markus Siegel, Timothy J. Buschman, and Earl K. Miller. “Cortical Information Flow during Flexible Sensorimotor Decisions”. In:Science348.6241 (June 2015), pp. 1352–1355.doi: 10.1126/ science.aab0551

2015

[41] [41]

Individual Variability of Neural Computations Underlying Flexible Decisions

Marino Pagan et al. “Individual Variability of Neural Computations Underlying Flexible Decisions”. In:Nature639.8054 (Mar. 2025), pp. 421–429.doi:10.1038/s41586-024-08433-6

work page doi:10.1038/s41586-024-08433-6 2025

[42] [42]

Czarnik, and Marlene R

Ramanujan Srinath, Martyna M. Czarnik, and Marlene R. Cohen.Coordinated Response Modulations Enable Flexible Use of Visual Information. July 2024.doi:10.1101/2024.07.10.602774

work page doi:10.1101/2024.07.10.602774 2024

[43] [43]

Task Set and Prefrontal Cortex

Katsuyuki Sakai. “Task Set and Prefrontal Cortex”. In:Annu. Rev. Neurosci.31.1 (2008), pp. 219– 245

2008

[44] [44]

Neural Mechanisms that Make Perceptual Decisions Flexible

Gouki Okazawa and Roozbeh Kiani. “Neural Mechanisms that Make Perceptual Decisions Flexible”. en. In:Annu. Rev. Physiol.(Nov. 2022)

2022

[45] [45]

Saxe, Shagun Sodhani, and Sam Lewallen.The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Andrew M. Saxe, Shagun Sodhani, and Sam Lewallen.The Neural Race Reduction: Dynamics of Abstraction in Gated Networks. July 2022.doi:10.48550/arXiv.2207.10430

work page doi:10.48550/arxiv.2207.10430 2022

[46] [46]

A Category-Free Neural Population Supports Evolving Demands during Decision-Making

David Raposo, Matthew T. Kaufman, and Anne K. Churchland. “A Category-Free Neural Population Supports Evolving Demands during Decision-Making”. In:Nature Neuroscience17.12 (Dec. 2014), pp. 1784–1792.doi:10.1038/nn.3865

work page doi:10.1038/nn.3865 2014

[47] [47]

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Arthur Jacot, Franck Gabriel, and Clement Hongler. “Neural Tangent Kernel: Convergence and Generalization in Neural Networks”. In:Advances in Neural Information Processing Systems. Ed. by S Bengio et al. Vol. 31. Curran Associates, Inc., 2018, pp. 8571–8580

2018

[48] [48]

On lazy training in differentiable programming

L Chizat, E Oyallon, and F Bach. “On lazy training in differentiable programming”. In:Adv. Neural Inf. Process. Syst.(2019)

2019

[49] [49]

Fine-Grained Analysis of Optimization and Generalization for Overparameter- ized Two-Layer Neural Networks

Sanjeev Arora et al. “Fine-Grained Analysis of Optimization and Generalization for Overparameter- ized Two-Layer Neural Networks”. In:Proceedings of the 36th International Conference on Machine Learning. Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 322–332

2019

[50] [50]

Wide neural networks of any depth evolve as linear models under gradient descent

Jaehoon Lee et al. “Wide neural networks of any depth evolve as linear models under gradient descent”. In:Adv. Neural Inf. Process. Syst.32 (2019)

2019

[51] [51]

Kernel and Rich Regimes in Overparametrized Models

Blake Woodworth et al. “Kernel and Rich Regimes in Overparametrized Models”. In:Proceedings of Thirty Third Conference on Learning Theory. Ed. by Jacob Abernethy and Shivani Agarwal. Vol. 125. Proceedings of Machine Learning Research. PMLR, 2020, pp. 3635–3673

2020

[52] [52]

Disentangling feature and lazy training in deep neural networks

Mario Geiger et al. “Disentangling feature and lazy training in deep neural networks”. In:J. Stat. Mech: Theory Exp.2020.11 (Nov. 2020), p. 113301

2020

[53] [53]

Geometric compression of invariant manifolds in neural nets

Jonas Paccolata et al. “Geometric compression of invariant manifolds in neural nets”. In:arXiv preprint arXiv:2007. 11471(2020)

2007

[54] [54]

Toward a Unified Theory of Efficient, Predictive, and Sparse Coding

Matthew Chalk, Olivier Marre, and Gaˇ sper Tkaˇ cik. “Toward a Unified Theory of Efficient, Predictive, and Sparse Coding”. In:Proceedings of the National Academy of Sciences115.1 (Jan. 2018), pp. 186– 191.doi:10.1073/pnas.1711114115

work page doi:10.1073/pnas.1711114115 2018

[55] [55]

Neuromodulated Spike-timing-Dependent Plasticity, and theory of three-factor learning rules

Nicolas Fr´ emaux and Wulfram Gerstner. “Neuromodulated Spike-timing-Dependent Plasticity, and theory of three-factor learning rules”. en. In:Front. Neural Circuits9 (2015), p. 85

2015

[56] [56]

Synaptic plasticity forms and functions

Jeffrey C Magee and Christine Grienberger. “Synaptic plasticity forms and functions”. en. In:Annu. Rev. Neurosci.43.1 (July 2020), pp. 95–117

2020

[57] [57]

Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning

Timothy P. Lillicrap et al. “Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning”. In:Nature Communications7.1 (Nov. 2016), p. 13276.doi: 10.1038/ncomms13276

work page doi:10.1038/ncomms13276 2016

[58] [58]

Direct Feedback Alignment Provides Learning in Deep Neural Networks

Arild Nø kland. “Direct Feedback Alignment Provides Learning in Deep Neural Networks”. In: Advances in Neural Information Processing Systems. Vol. 29. Curran Associates, Inc., 2016. 17

2016

[59] [59]

A mathematical theory of semantic development in deep neural networks

Andrew M Saxe, James L McClelland, and Surya Ganguli. “A mathematical theory of semantic development in deep neural networks”. en. In:Proc. Natl. Acad. Sci. U. S. A.116.23 (June 2019), pp. 11537–11546

2019

[60] [60]

https://arxiv.org/abs/2210.02157v2

Blake Bordelon and Cengiz Pehlevan.The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks. https://arxiv.org/abs/2210.02157v2. Oct. 2022

work page arXiv 2022

[61] [61]

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Pratik Chaudhari and Stefano Soatto. “Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks”. In:arXiv [cs.LG](Oct. 2017)

2017

[62] [62]

Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning

Yao Zhang et al. “Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning”. en. In:Mol. Phys.116.21-22 (Nov. 2018), pp. 3214–3223

2018

[63] [63]

Machine learning in and out of equilibrium

Shishir Adhikari et al. “Machine learning in and out of equilibrium”. In:arXiv [cs.LG](June 2023)

2023

[64] [64]

Stochastic Gradient Descent as Approximate Bayesian Inference

Stephan Mandt, Matthew D. Hoffman, and David M. Blei.Stochastic Gradient Descent as Approxi- mate Bayesian Inference. Jan. 2018.doi:10.48550/arXiv.1704.04289

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1704.04289 2018

[65] [65]

Bayesian learning and inference in recurrent switching linear dynamical systems

S Linderman, M Johnson, A Miller, et al. “Bayesian learning and inference in recurrent switching linear dynamical systems”. In:Artif. Intell.(2017)

2017

[66] [66]

Diversity of emergent dynamics in competitive threshold-linear networks

Katherine Morrison et al. “Diversity of emergent dynamics in competitive threshold-linear networks”. en. In:SIAM J. Appl. Dyn. Syst.23.1 (Mar. 2024), pp. 855–884

2024

[67] [67]

Mechanisms underlying gain modulation in the cortex

Katie A Ferguson and Jessica A Cardin. “Mechanisms underlying gain modulation in the cortex”. en. In:Nat. Rev. Neurosci.21.2 (Feb. 2020), pp. 80–92

2020

[68] [68]

Gain modulation: a major computational principle of the central nervous system

E Salinas and P Thier. “Gain modulation: a major computational principle of the central nervous system”. en. In:Neuron27.1 (July 2000), pp. 15–21

2000

[69] [69]

Motor primitives in space and time via targeted gain modulation in cortical networks

Jake P Stroud et al. “Motor primitives in space and time via targeted gain modulation in cortical networks”. en. In:Nat. Neurosci.21.12 (Dec. 2018), pp. 1774–1783

2018

[70] [70]

Structured flexibility in recurrent neural networks via neuromodulation

Julia C Costacurta et al. “Structured flexibility in recurrent neural networks via neuromodulation”. In:bioRxiv37 (July 2024), pp. 1954–1972

2024

[71] [71]

Thalamic control of cortical dynamics in a model of flexible motor sequencing

Laureline Logiaco, L F Abbott, and Sean Escola. “Thalamic control of cortical dynamics in a model of flexible motor sequencing”. en. In:Cell Rep.35.9 (June 2021), p. 109090

2021

[72] [72]

Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model

Ta-Chu Kao, Mahdieh S Sadabadi, and Guillaume Hennequin. “Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model”. en. In:Neuron109.9 (May 2021), 1567–1581.e12

2021

[73] [73]

Latent circuit inference from heterogeneous neural responses during cognitive tasks

Christopher Langdon and Tatiana A Engel. “Latent circuit inference from heterogeneous neural responses during cognitive tasks”. en. In:Nat. Neurosci.28.3 (Mar. 2025), pp. 665–675

2025

[74] [74]

Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks

Francesca Mastrogiuseppe and Srdjan Ostojic. “Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks”. In:Neuron99.3 (Aug. 2018), 609–623.e29.doi: 10.1016/ j.neuron.2018.07.003

2018

[75] [75]

Extracting computational mechanisms from neural activity with low-rank networks

Adrian Valente, Jonathan Pillow, and Srdjan Ostojic. “Extracting computational mechanisms from neural activity with low-rank networks”. In:Neur Inf Proc Sys35 (2022), pp. 24072–24086

2022

[76] [76]

Early selection of task-relevant features through population gating

Joao Barbosa et al. “Early selection of task-relevant features through population gating”. en. In: Nat. Commun.14.1 (Oct. 2023), p. 6837

2023

[77] [77]

Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks

Manuel Beiran et al. “Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks”. en. In:Neural Comput.33.6 (May 2021), pp. 1572–1615

2021

[78] [78]

Lecture Notes

David Rosenberg and Julia Kempe.Lagrangian Duality and Convex Optimization. Lecture Notes. CDS, NYU, Feb. 2019.url: https://davidrosenberg.github.io/mlcourse/Archive/2019/ Lectures/04a.convex-optimization.pdf

2019

[79] [79]

Cambridge New York Melbourne New Delhi Singapore: Cambridge University Press, 2023

Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge New York Melbourne New Delhi Singapore: Cambridge University Press, 2023. 727 pp

2023

[80] [80]

Sznitman,Topics in propagation of chaos, in École d’Été de Probabilités de Saint-Flour XIX—1989, vol

Alain-Sol Sznitman. “Topics in Propagation of Chaos”. In:Ecole d’Et´ e de Probabilit´ es de Saint-Flour XIX — 1989. Vol. 1464. Berlin, Heidelberg: Springer Berlin Heidelberg, 1991, pp. 165–251.doi: 10.1007/BFb0085169. 18 A Maximum Entropy calculation A.1 Recap on Convex optimization We start with a general summary of the convex optimization approach that ...

work page doi:10.1007/bfb0085169 1989