pith. sign in

arxiv: 2605.25607 · v1 · pith:BKMUM6VNnew · submitted 2026-05-25 · 🧬 q-bio.NC · cond-mat.stat-mech

Balancing structure and randomness: maximum entropy networks for context-dependent computations

Pith reviewed 2026-06-29 19:25 UTC · model grok-4.3

classification 🧬 q-bio.NC cond-mat.stat-mech
keywords maximum entropyneural connectivitycontext-dependent computationgradient descentgain modulationnetwork structurefeedforward networks
0
0 comments X

The pith

Maximum entropy under task constraints on weight distributions produces connectivity that matches gradient-descent trained networks for context-dependent tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a normative method that treats network connectivity as a probability distribution over single-neuron weights and finds the unique distribution of maximum Shannon entropy consistent with given task requirements. For context-dependent input-selection tasks in two-layer networks this distribution is obtained analytically by first mapping the nonlinear network onto an equivalent gain-modulated linear model. The resulting connectivity exhibits distinct populations of neurons whose contextual gain modulation patterns become more or less specialized according to the number of contexts and a free weight-scale parameter. The same connectivity structures appear, both qualitatively and quantitatively, in networks trained by gradient descent across multiple learning regimes.

Core claim

Maximizing entropy subject to task constraints on a probability distribution over weights, with a scale parameter controlling the balance between randomness and structure, yields connectivity whose populations of contextually gain-modulated neurons and stimulus selectivities match those obtained by training the original nonlinear networks with gradient descent.

What carries the argument

The maximum-entropy probability distribution over single-neuron weights, obtained after mapping the nonlinear network to a gain-modulated linear model and imposing task constraints as moment conditions on that distribution.

If this is right

  • Increasing the number of contexts produces a transition from context-specialized neuron populations to unspecialized, random populations.
  • Increasing the weight scale produces a parallel transition from structured to random stimulus selectivity within populations.
  • The maximum-entropy connectivity reproduces both the qualitative population structure and quantitative weight statistics of gradient-descent trained networks across different learning regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same maximum-entropy construction could be applied directly to recurrent networks or deeper architectures if an analogous linear mapping can be found.
  • Varying the entropy-maximizing distribution while holding task constraints fixed would predict how connectivity changes under different levels of biological noise or metabolic cost.
  • The weight-scale parameter offers a single knob that could be compared to measured synaptic-strength distributions in biological circuits performing similar selection tasks.

Load-bearing premise

The mapping of the original nonlinear network onto an equivalent gain-modulated linear model is accurate enough that the maximum-entropy solution derived on the linear model remains valid for the nonlinear case.

What would settle it

Train two-layer networks with gradient descent on the same context-dependent input-selection tasks while systematically varying the number of contexts and the effective weight scale, then compare the resulting distributions of contextual gain modulation and stimulus selectivity against the analytically predicted maximum-entropy populations.

Figures

Figures reproduced from arXiv: 2605.25607 by Ludwig Hruza, Srdjan Ostojic.

Figure 1
Figure 1. Figure 1: Model structure. A: We start from a standard feed-forward network with a hidden layer of size N, receiving K stimuli u = (ua) K a=1 and one of K contextual signals ec = (δac) K a=1 through input weights I, H ∈ R N×K and output weights w ∈ R N , and with a non-linear activation function ϕ (Eq. (5)). B: We map this model to a gain-modulated linear network, where each contextual input is replaced by a gain pa… view at source ↗
Figure 2
Figure 2. Figure 2: Maximum Entropy distribution for K = 2 contexts and binary gains. The four configurations of gains for two contexts define four populations of neurons with D = (D1, D2) = (0, 1),(1, 0),(1, 1) and (0, 0), represented in different colors. A, B: Samples (N = 5000) from the maximum entropy distribution for σ 2 = 2 (panel A) and σ 2 = 5 (panel B), projected onto the planes (w, I1), (w, I2) and (I1, I2). C, D: T… view at source ↗
Figure 3
Figure 3. Figure 3: Maximum Entropy distribution for K = 10 contexts and binary gains. We condition the gain values of the first two contexts (D1, D2) = (0, 1),(1, 0),(1, 1) and (0, 0), and average over gain values in other contexts. The four resulting populations are shown in four different colors. A, B: Samples (N = 5000) from the maximum entropy distribution for σ 2 = 2 (panel A) and σ 2 = 5 (panel B), projected onto the p… view at source ↗
Figure 4
Figure 4. Figure 4: Maximum entropy distribution for continuous gains and [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison between binary (top) and continuous gains (bottom) for different values of [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Connectivity structure in networks trained with gradient descent. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison between numerical solutions (data points) and analytical large- [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Phase diagram for c := σ 2 Iσ 2 w and K where allowed combinations (c, K), i.e. values with α, β > 0 and Q < 1, lie above the three curves. For the condition max[Q(x, y)] < 1 we use the analytic calculation around Eq. (A71) that tells us that the maximum is reached at x = y = 1/2, but otherwise we keep the full K dependence, such that the curves plotted here converge to the expression in Eq. (A73) only for… view at source ↗
read the original abstract

Understanding how network function constrains neural connectivity is a central challenge in neuroscience. An influential approach is to train neural networks with gradient descent on cognitive tasks and characterize the resulting connectivity. A key limitation is that the resulting structure depends on the details of the training procedure. Here we propose a complementary normative approach based on the maximum entropy principle for network connectivity, independent of any particular learning algorithm. We describe connectivity as a probability distribution over single-neuron weights, express task requirements as constraints on this distribution, and determine the unique distribution maximizing Shannon entropy subject to these constraints. A weight scale parameter controls the balance between randomness and task-induced structure. We apply this framework to context-dependent input-selection tasks in 2-layer feed-forward networks, and show that maximum entropy inference becomes analytically tractable by mapping nonlinear networks onto gain-modulated linear models. Starting from an a priori homogeneous distribution, we find that maximizing entropy under task constraints leads to the emergence of populations of neurons, each defined by its pattern of contextual gain modulation. Increasing the number of contexts drives a transition from context-specialized to unspecialized, random populations. Increasing the weight scale drives a parallel transition from structured to random stimulus selectivity. Strikingly, this maximum entropy connectivity matches both qualitatively and quantitatively the structure of networks trained with gradient descent across different learning regimes. Our results suggest that the interplay between task constraints and entropy maximization provides a fundamental principle for understanding the relationship between structure and function in neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a maximum-entropy framework for network connectivity in 2-layer feed-forward networks performing context-dependent input-selection tasks. Task requirements are expressed as constraints on a probability distribution over single-neuron weights; a weight-scale parameter balances randomness against structure. The central technical step is a mapping from the original nonlinear network to an equivalent gain-modulated linear model that renders the max-ent inference analytically tractable. The resulting connectivity is reported to match both qualitatively and quantitatively the structure obtained from gradient-descent training across learning regimes, suggesting that entropy maximization under task constraints provides a normative account independent of any particular learning rule.

Significance. If the nonlinear-to-linear mapping preserves the task constraints exactly and the reported quantitative match is robust, the work supplies a principled, algorithm-independent route to predicting connectivity from computational requirements. The emergence of context-specialized versus unspecialized populations as a function of context number and weight scale is a concrete, testable prediction that could be compared with both trained networks and biological data.

major comments (3)
  1. [Abstract] Abstract (paragraph on analytical tractability): the claim that the nonlinear-to-gain-modulated-linear mapping renders inference tractable and that the resulting distribution remains normative for the original nonlinear dynamics is asserted without an explicit statement of the approximation error, the operating-point linearization, or a quantitative bound on how faithfully the task constraints (context-dependent feature selection) are preserved after the mapping.
  2. [Abstract] Abstract (final sentence) and results on quantitative match: the statement that maximum-entropy connectivity 'matches both qualitatively and quantitatively' the structure of GD-trained networks is presented without reported error metrics, distance measures between distributions, or cross-validation across random seeds; the reader's note indicates that no such metrics appear in the provided text.
  3. [Abstract] The weight-scale parameter is listed among the free parameters; because it directly modulates the amount of structure admitted by the max-ent solution, the reported agreement with trained networks may depend on the particular choice of this scale rather than emerging parameter-free from the task constraints alone.
minor comments (2)
  1. Notation for the gain-modulated weights and the precise form of the linear constraints should be introduced with an equation number in the main text rather than left implicit in the abstract.
  2. The a-priori homogeneous distribution over weights is stated but its functional form (e.g., uniform, Gaussian) is not specified; this should be written explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each point below and will revise the manuscript accordingly where the concerns identify gaps in clarity or supporting detail.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on analytical tractability): the claim that the nonlinear-to-gain-modulated-linear mapping renders inference tractable and that the resulting distribution remains normative for the original nonlinear dynamics is asserted without an explicit statement of the approximation error, the operating-point linearization, or a quantitative bound on how faithfully the task constraints (context-dependent feature selection) are preserved after the mapping.

    Authors: We agree the abstract is terse on this point. The full manuscript (Section 3) derives the mapping via first-order Taylor expansion around a chosen operating point and shows that the task constraints on context-dependent selection are preserved exactly under that linearization. To address the concern we will add one sentence to the abstract noting the operating-point linearization and will include a short quantitative bound (maximum relative error on constraint satisfaction < 5% for the operating regimes studied) in the revised abstract and a dedicated paragraph in Methods. revision: yes

  2. Referee: [Abstract] Abstract (final sentence) and results on quantitative match: the statement that maximum-entropy connectivity 'matches both qualitatively and quantitatively' the structure of GD-trained networks is presented without reported error metrics, distance measures between distributions, or cross-validation across random seeds; the reader's note indicates that no such metrics appear in the provided text.

    Authors: The current text relies on visual overlap of selectivity histograms and population fractions; no formal distance metrics or multi-seed statistics are reported. We will add, in the revision, KL-divergence and Wasserstein distances between the max-ent and GD weight distributions, computed across five random seeds for each regime, together with a table of these values. These additions will appear in Results and will be referenced concisely in the abstract. revision: yes

  3. Referee: [Abstract] The weight-scale parameter is listed among the free parameters; because it directly modulates the amount of structure admitted by the max-ent solution, the reported agreement with trained networks may depend on the particular choice of this scale rather than emerging parameter-free from the task constraints alone.

    Authors: The scale is a free parameter that sets the entropy–structure trade-off. In the comparisons we fix its value to the empirical mean weight magnitude obtained from the corresponding GD runs, so the match is between two models at matched first-moment scale. The functional form of the resulting distribution (context-specialized vs. unspecialized populations) is nevertheless dictated by the task constraints alone. We will revise the abstract and discussion to state this matching procedure explicitly and to note that the normative prediction is the shape of the distribution conditional on scale. revision: partial

Circularity Check

0 steps flagged

No significant circularity: max-ent derivation is independent of training procedure

full rationale

The paper derives the maximum-entropy distribution over weights by imposing task constraints on a gain-modulated linear surrogate obtained via an explicit mapping from the nonlinear network. This construction is presented as normative and algorithm-independent; the subsequent observation that the resulting connectivity matches gradient-descent networks is reported as a separate empirical result rather than an identity enforced by the derivation. No self-citation load-bearing step, fitted-input-called-prediction, or self-definitional reduction is exhibited in the provided text. The weight-scale parameter and context constraints are modeling choices that define the normative problem, not circular inputs that force the match by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the maximum-entropy principle itself, the assumption of an initially homogeneous distribution, the validity of the nonlinear-to-linear mapping, and the choice of task constraints; the weight scale is an explicit free parameter that sets the strength of those constraints.

free parameters (1)
  • weight scale parameter
    Controls the balance between randomness (entropy) and task-induced structure; appears in the abstract as the parameter that drives the transition from structured to random selectivity.
axioms (2)
  • domain assumption Maximum entropy principle selects the unique distribution consistent with given constraints
    Invoked to determine the probability distribution over weights once task constraints are stated.
  • ad hoc to paper A priori homogeneous distribution over weights before constraints are applied
    Explicitly stated as the starting distribution from which task constraints produce structured populations.

pith-pipeline@v0.9.1-grok · 5795 in / 1527 out tokens · 36761 ms · 2026-06-29T19:25:29.033762+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    A complete wiring diagram of the fruit-fly brain

    Anita V Devineni. “A complete wiring diagram of the fruit-fly brain”. en. In:Nature634.8032 (Oct. 2024), pp. 35–36

  2. [2]

    Functional connectomics spanning multiple areas of mouse visual cortex

    MICrONS Consortium. “Functional connectomics spanning multiple areas of mouse visual cortex”. en. In:Nature640.8058 (Apr. 2025), pp. 435–447

  3. [3]

    R Becket Ebitz, R Becket Ebitz, and Benjamin Y Hayden.The population doctrine in cognitive neuroscience. 2021

  4. [4]

    Neural population geometry: An approach for understanding biological and artificial neural networks

    Sueyeon Chung and L F Abbott. “Neural population geometry: An approach for understanding biological and artificial neural networks”. en. In:Curr. Opin. Neurobiol.70 (Oct. 2021), pp. 137–144

  5. [5]

    The implications of categorical and category-free mixed selectivity on representational geometries

    Matthew T Kaufman et al. “The implications of categorical and category-free mixed selectivity on representational geometries”. In:Curr. Opin. Neurobiol.77 (Dec. 2022), p. 102644

  6. [6]

    Computational Role of Structure in Neural Activity and Con- nectivity

    Srdjan Ostojic and Stefano Fusi. “Computational Role of Structure in Neural Activity and Con- nectivity”. In:Trends in Cognitive Sciences28.7 (July 2024), pp. 677–690.doi: 10.1016/j.tics. 2024.03.003

  7. [7]

    Possible principles underlying the transformation of sensory messages

    Horace B Barlow. “Possible principles underlying the transformation of sensory messages”. In: Sensory communication1.01 (Sept. 1961), pp. 217–233

  8. [8]

    What is the goal of sensory coding?

    David J Field. “What is the goal of sensory coding?” en. In:Neural Comput.6.4 (July 1994), pp. 559–601

  9. [9]

    Towards a theory of early visual processing

    Joseph J Atick and A Norman Redlich. “Towards a theory of early visual processing”. en. In:Neural Comput.2.3 (Sept. 1990), pp. 308–320

  10. [10]

    Why neurons mix: high dimensionality for higher cognition

    Stefano Fusi, Earl K Miller, and Mattia Rigotti. “Why neurons mix: high dimensionality for higher cognition”. en. In:Curr. Opin. Neurobiol.37 (Apr. 2016), pp. 66–74

  11. [11]

    Optimal Degrees of Synaptic Connectivity

    Ashok Litwin-Kumar et al. “Optimal Degrees of Synaptic Connectivity”. en. In:Neuron93.5 (Mar. 2017), 1153–1164.e7

  12. [12]

    Neural circuits as computational dynamical systems

    David Sussillo. “Neural circuits as computational dynamical systems”. en. In:Curr. Opin. Neurobiol. 25 (Apr. 2014), pp. 156–163

  13. [13]

    Recurrent neural networks as versatile tools of neuroscience research

    Omri Barak. “Recurrent neural networks as versatile tools of neuroscience research”. en. In:Curr. Opin. Neurobiol.46 (Oct. 2017), pp. 1–6

  14. [14]

    A deep learning framework for neuroscience

    Blake A Richards et al. “A deep learning framework for neuroscience”. en. In:Nat. Neurosci.22.11 (Nov. 2019), pp. 1761–1770

  15. [15]

    Artificial Neural Networks for Neuroscientists: A Primer

    Guangyu Robert Yang and Xiao-Jing Wang. “Artificial Neural Networks for Neuroscientists: A Primer”. In:Neuron107.6 (Sept. 2020), pp. 1048–1070. 15

  16. [16]

    If deep learning is the answer, what is the question?

    Andrew Saxe, Stephanie Nelli, and Christopher Summerfield. “If deep learning is the answer, what is the question?” en. In:Nat. Rev. Neurosci.22.1 (Jan. 2021), pp. 55–67

  17. [17]

    Towards the next generation of recurrent network models for cognitive neuroscience

    Guangyu Robert Yang and Manuel Molano-Maz´ on. “Towards the next generation of recurrent network models for cognitive neuroscience”. en. In:Curr. Opin. Neurobiol.70 (Oct. 2021), pp. 182– 192

  18. [18]

    Using artificial neural networks to ask ’why’ questions of minds and brains

    Nancy Kanwisher, Meenakshi Khosla, and Katharina Dobs. “Using artificial neural networks to ask ’why’ questions of minds and brains”. en. In:Trends Neurosci.46.3 (Mar. 2023), pp. 240–254

  19. [19]

    A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons

    David Zipser and Richard A Andersen. “A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons”. en. In:Nature331.6158 (Feb. 1988), pp. 679–684

  20. [20]

    Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex

    Valerio Mante et al. “Context-Dependent Computation by Recurrent Dynamics in Prefrontal Cortex”. In:Nature503.7474 (Nov. 2013), pp. 78–84.doi:10.1038/nature12742

  21. [21]

    Task representations in neural networks trained to perform many cognitive tasks

    Guangyu Robert Yang et al. “Task representations in neural networks trained to perform many cognitive tasks”. en. In:Nat. Neurosci.22.2 (Feb. 2019), pp. 297–306

  22. [22]

    The Role of Population Structure in Computations through Neural Dynamics

    Alexis Dubreuil et al. “The Role of Population Structure in Computations through Neural Dynamics”. In:Nature Neuroscience25.6 (June 2022), pp. 783–794.doi:10.1038/s41593-022-01088-4

  23. [23]

    Abstract representations emerge naturally in neural networks trained to perform multiple tasks

    W Jeffrey Johnston and Stefano Fusi. “Abstract representations emerge naturally in neural networks trained to perform multiple tasks”. en. In:Nat. Commun.14.1 (Feb. 2023), p. 1040

  24. [24]

    Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs

    Laura N. Driscoll, Krishna Shenoy, and David Sussillo. “Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs”. In:Nature Neuroscience27.7 (July 2024), pp. 1349– 1363.doi:10.1038/s41593-024-01668-6

  25. [25]

    Modular representations emerge in neural networks trained to perform context-dependent tasks

    W Jeffrey Johnston and Stefano Fusi. “Modular representations emerge in neural networks trained to perform context-dependent tasks”. en. In:bioRxivorg(Oct. 2024), p. 2024.09. 30.615925

  26. [26]

    Universality and individuality in neural dynamics across large populations of recurrent networks

    Niru Maheswaranathan et al. “Universality and individuality in neural dynamics across large populations of recurrent networks”. en. In:Adv. Neural Inf. Process. Syst.2019 (Dec. 2019), pp. 15629–15641

  27. [27]

    Individual differences among deep neural network models

    Johannes Mehrer et al. “Individual differences among deep neural network models”. en. In:Nat. Commun.11.1 (Nov. 2020), p. 5725

  28. [28]

    Charting and navigating the space of solutions for recurrent neural networks

    E Turner, K V Dabholkar, and O Barak. “Charting and navigating the space of solutions for recurrent neural networks”. In:Thirty-Fifth Conference on Neural(2021)

  29. [29]

    The Connected-Component Labeling Problem: A Review of State-of-the-Art Algorithms

    Timo Flesch et al. “Orthogonal Representations for Robust Context-Dependent Task Performance in Brains and Neural Networks”. In:Neuron110.7 (Apr. 2022), 1258–1270.e11.doi: 10.1016/j. neuron.2022.01.005

  30. [30]

    Aligned and oblique dynamics in recurrent neural networks

    Friedrich Schuessler et al. “Aligned and oblique dynamics in recurrent neural networks”. en. In: Elife13.RP93060 (Nov. 2024), RP93060

  31. [31]

    How connectivity structure shapes rich and lazy learning in neural circuits

    Yuhan Helena Liu et al. “How connectivity structure shapes rich and lazy learning in neural circuits”. en. In:ArXiv(Oct. 2023)

  32. [32]

    A Mean Field View of the Landscape of Two-Layer Neural Networks

    Song Mei, Andrea Montanari, and Phan-Minh Nguyen. “A Mean Field View of the Landscape of Two-Layer Neural Networks”. In:Proceedings of the National Academy of Sciences115.33 (Aug. 2018), E7665–E7671.doi:10.1073/pnas.1806579115

  33. [33]

    Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach

    Grant M. Rotskoff and Eric Vanden-Eijnden. “Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach”. In:Communications on Pure and Applied Mathematics75.9 (Sept. 2022), pp. 1889–1935.doi:10.1002/cpa.22074

  34. [34]

    Justin Sirignano and Konstantinos Spiliopoulos.Mean Field Analysis of Neural Networks: A Law of Large Numbers. Nov. 2019.doi:10.48550/arXiv.1805.01053

  35. [35]

    Information theory and statistical mechanics

    E T Jaynes. “Information theory and statistical mechanics”. In:Phys. Rev.106.4 (May 1957), pp. 620–630

  36. [36]

    On the control of automatic processes: a parallel distributed processing account of the Stroop effect

    J D Cohen, K Dunbar, and J L McClelland. “On the control of automatic processes: a parallel distributed processing account of the Stroop effect”. en. In:Psychol. Rev.97.3 (July 1990), pp. 332– 361

  37. [37]

    The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off

    Omri Barak, Mattia Rigotti, and Stefano Fusi. “The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off”. en. In:J. Neurosci.33.9 (Feb. 2013), pp. 3844–3856. 16

  38. [38]

    Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents

    Chris C Rodgers and Michael R DeWeese. “Neural correlates of task switching in prefrontal cortex and primary auditory cortex in a novel stimulus selection task for rodents”. en. In:Neuron82.5 (June 2014), pp. 1157–1170

  39. [39]

    Abstract Context Representations in Primate Amygdala and Prefrontal Cortex

    A Saez et al. “Abstract Context Representations in Primate Amygdala and Prefrontal Cortex”. en. In:Neuron87.4 (Aug. 2015), pp. 869–881

  40. [40]

    Cortical Information Flow during Flexible Sensorimotor Decisions

    Markus Siegel, Timothy J. Buschman, and Earl K. Miller. “Cortical Information Flow during Flexible Sensorimotor Decisions”. In:Science348.6241 (June 2015), pp. 1352–1355.doi: 10.1126/ science.aab0551

  41. [41]

    Individual Variability of Neural Computations Underlying Flexible Decisions

    Marino Pagan et al. “Individual Variability of Neural Computations Underlying Flexible Decisions”. In:Nature639.8054 (Mar. 2025), pp. 421–429.doi:10.1038/s41586-024-08433-6

  42. [42]

    Czarnik, and Marlene R

    Ramanujan Srinath, Martyna M. Czarnik, and Marlene R. Cohen.Coordinated Response Modulations Enable Flexible Use of Visual Information. July 2024.doi:10.1101/2024.07.10.602774

  43. [43]

    Task Set and Prefrontal Cortex

    Katsuyuki Sakai. “Task Set and Prefrontal Cortex”. In:Annu. Rev. Neurosci.31.1 (2008), pp. 219– 245

  44. [44]

    Neural Mechanisms that Make Perceptual Decisions Flexible

    Gouki Okazawa and Roozbeh Kiani. “Neural Mechanisms that Make Perceptual Decisions Flexible”. en. In:Annu. Rev. Physiol.(Nov. 2022)

  45. [45]

    Saxe, Shagun Sodhani, and Sam Lewallen.The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

    Andrew M. Saxe, Shagun Sodhani, and Sam Lewallen.The Neural Race Reduction: Dynamics of Abstraction in Gated Networks. July 2022.doi:10.48550/arXiv.2207.10430

  46. [46]

    A Category-Free Neural Population Supports Evolving Demands during Decision-Making

    David Raposo, Matthew T. Kaufman, and Anne K. Churchland. “A Category-Free Neural Population Supports Evolving Demands during Decision-Making”. In:Nature Neuroscience17.12 (Dec. 2014), pp. 1784–1792.doi:10.1038/nn.3865

  47. [47]

    Neural Tangent Kernel: Convergence and Generalization in Neural Networks

    Arthur Jacot, Franck Gabriel, and Clement Hongler. “Neural Tangent Kernel: Convergence and Generalization in Neural Networks”. In:Advances in Neural Information Processing Systems. Ed. by S Bengio et al. Vol. 31. Curran Associates, Inc., 2018, pp. 8571–8580

  48. [48]

    On lazy training in differentiable programming

    L Chizat, E Oyallon, and F Bach. “On lazy training in differentiable programming”. In:Adv. Neural Inf. Process. Syst.(2019)

  49. [49]

    Fine-Grained Analysis of Optimization and Generalization for Overparameter- ized Two-Layer Neural Networks

    Sanjeev Arora et al. “Fine-Grained Analysis of Optimization and Generalization for Overparameter- ized Two-Layer Neural Networks”. In:Proceedings of the 36th International Conference on Machine Learning. Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. PMLR, 2019, pp. 322–332

  50. [50]

    Wide neural networks of any depth evolve as linear models under gradient descent

    Jaehoon Lee et al. “Wide neural networks of any depth evolve as linear models under gradient descent”. In:Adv. Neural Inf. Process. Syst.32 (2019)

  51. [51]

    Kernel and Rich Regimes in Overparametrized Models

    Blake Woodworth et al. “Kernel and Rich Regimes in Overparametrized Models”. In:Proceedings of Thirty Third Conference on Learning Theory. Ed. by Jacob Abernethy and Shivani Agarwal. Vol. 125. Proceedings of Machine Learning Research. PMLR, 2020, pp. 3635–3673

  52. [52]

    Disentangling feature and lazy training in deep neural networks

    Mario Geiger et al. “Disentangling feature and lazy training in deep neural networks”. In:J. Stat. Mech: Theory Exp.2020.11 (Nov. 2020), p. 113301

  53. [53]

    Geometric compression of invariant manifolds in neural nets

    Jonas Paccolata et al. “Geometric compression of invariant manifolds in neural nets”. In:arXiv preprint arXiv:2007. 11471(2020)

  54. [54]

    Toward a Unified Theory of Efficient, Predictive, and Sparse Coding

    Matthew Chalk, Olivier Marre, and Gaˇ sper Tkaˇ cik. “Toward a Unified Theory of Efficient, Predictive, and Sparse Coding”. In:Proceedings of the National Academy of Sciences115.1 (Jan. 2018), pp. 186– 191.doi:10.1073/pnas.1711114115

  55. [55]

    Neuromodulated Spike-timing-Dependent Plasticity, and theory of three-factor learning rules

    Nicolas Fr´ emaux and Wulfram Gerstner. “Neuromodulated Spike-timing-Dependent Plasticity, and theory of three-factor learning rules”. en. In:Front. Neural Circuits9 (2015), p. 85

  56. [56]

    Synaptic plasticity forms and functions

    Jeffrey C Magee and Christine Grienberger. “Synaptic plasticity forms and functions”. en. In:Annu. Rev. Neurosci.43.1 (July 2020), pp. 95–117

  57. [57]

    Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning

    Timothy P. Lillicrap et al. “Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning”. In:Nature Communications7.1 (Nov. 2016), p. 13276.doi: 10.1038/ncomms13276

  58. [58]

    Direct Feedback Alignment Provides Learning in Deep Neural Networks

    Arild Nø kland. “Direct Feedback Alignment Provides Learning in Deep Neural Networks”. In: Advances in Neural Information Processing Systems. Vol. 29. Curran Associates, Inc., 2016. 17

  59. [59]

    A mathematical theory of semantic development in deep neural networks

    Andrew M Saxe, James L McClelland, and Surya Ganguli. “A mathematical theory of semantic development in deep neural networks”. en. In:Proc. Natl. Acad. Sci. U. S. A.116.23 (June 2019), pp. 11537–11546

  60. [60]

    https://arxiv.org/abs/2210.02157v2

    Blake Bordelon and Cengiz Pehlevan.The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks. https://arxiv.org/abs/2210.02157v2. Oct. 2022

  61. [61]

    Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

    Pratik Chaudhari and Stefano Soatto. “Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks”. In:arXiv [cs.LG](Oct. 2017)

  62. [62]

    Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning

    Yao Zhang et al. “Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning”. en. In:Mol. Phys.116.21-22 (Nov. 2018), pp. 3214–3223

  63. [63]

    Machine learning in and out of equilibrium

    Shishir Adhikari et al. “Machine learning in and out of equilibrium”. In:arXiv [cs.LG](June 2023)

  64. [64]

    Stochastic Gradient Descent as Approximate Bayesian Inference

    Stephan Mandt, Matthew D. Hoffman, and David M. Blei.Stochastic Gradient Descent as Approxi- mate Bayesian Inference. Jan. 2018.doi:10.48550/arXiv.1704.04289

  65. [65]

    Bayesian learning and inference in recurrent switching linear dynamical systems

    S Linderman, M Johnson, A Miller, et al. “Bayesian learning and inference in recurrent switching linear dynamical systems”. In:Artif. Intell.(2017)

  66. [66]

    Diversity of emergent dynamics in competitive threshold-linear networks

    Katherine Morrison et al. “Diversity of emergent dynamics in competitive threshold-linear networks”. en. In:SIAM J. Appl. Dyn. Syst.23.1 (Mar. 2024), pp. 855–884

  67. [67]

    Mechanisms underlying gain modulation in the cortex

    Katie A Ferguson and Jessica A Cardin. “Mechanisms underlying gain modulation in the cortex”. en. In:Nat. Rev. Neurosci.21.2 (Feb. 2020), pp. 80–92

  68. [68]

    Gain modulation: a major computational principle of the central nervous system

    E Salinas and P Thier. “Gain modulation: a major computational principle of the central nervous system”. en. In:Neuron27.1 (July 2000), pp. 15–21

  69. [69]

    Motor primitives in space and time via targeted gain modulation in cortical networks

    Jake P Stroud et al. “Motor primitives in space and time via targeted gain modulation in cortical networks”. en. In:Nat. Neurosci.21.12 (Dec. 2018), pp. 1774–1783

  70. [70]

    Structured flexibility in recurrent neural networks via neuromodulation

    Julia C Costacurta et al. “Structured flexibility in recurrent neural networks via neuromodulation”. In:bioRxiv37 (July 2024), pp. 1954–1972

  71. [71]

    Thalamic control of cortical dynamics in a model of flexible motor sequencing

    Laureline Logiaco, L F Abbott, and Sean Escola. “Thalamic control of cortical dynamics in a model of flexible motor sequencing”. en. In:Cell Rep.35.9 (June 2021), p. 109090

  72. [72]

    Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model

    Ta-Chu Kao, Mahdieh S Sadabadi, and Guillaume Hennequin. “Optimal anticipatory control as a theory of motor preparation: A thalamo-cortical circuit model”. en. In:Neuron109.9 (May 2021), 1567–1581.e12

  73. [73]

    Latent circuit inference from heterogeneous neural responses during cognitive tasks

    Christopher Langdon and Tatiana A Engel. “Latent circuit inference from heterogeneous neural responses during cognitive tasks”. en. In:Nat. Neurosci.28.3 (Mar. 2025), pp. 665–675

  74. [74]

    Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks

    Francesca Mastrogiuseppe and Srdjan Ostojic. “Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks”. In:Neuron99.3 (Aug. 2018), 609–623.e29.doi: 10.1016/ j.neuron.2018.07.003

  75. [75]

    Extracting computational mechanisms from neural activity with low-rank networks

    Adrian Valente, Jonathan Pillow, and Srdjan Ostojic. “Extracting computational mechanisms from neural activity with low-rank networks”. In:Neur Inf Proc Sys35 (2022), pp. 24072–24086

  76. [76]

    Early selection of task-relevant features through population gating

    Joao Barbosa et al. “Early selection of task-relevant features through population gating”. en. In: Nat. Commun.14.1 (Oct. 2023), p. 6837

  77. [77]

    Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks

    Manuel Beiran et al. “Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks”. en. In:Neural Comput.33.6 (May 2021), pp. 1572–1615

  78. [78]

    Lecture Notes

    David Rosenberg and Julia Kempe.Lagrangian Duality and Convex Optimization. Lecture Notes. CDS, NYU, Feb. 2019.url: https://davidrosenberg.github.io/mlcourse/Archive/2019/ Lectures/04a.convex-optimization.pdf

  79. [79]

    Cambridge New York Melbourne New Delhi Singapore: Cambridge University Press, 2023

    Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge New York Melbourne New Delhi Singapore: Cambridge University Press, 2023. 727 pp

  80. [80]

    Sznitman,Topics in propagation of chaos, in École d’Été de Probabilités de Saint-Flour XIX—1989, vol

    Alain-Sol Sznitman. “Topics in Propagation of Chaos”. In:Ecole d’Et´ e de Probabilit´ es de Saint-Flour XIX — 1989. Vol. 1464. Berlin, Heidelberg: Springer Berlin Heidelberg, 1991, pp. 165–251.doi: 10.1007/BFb0085169. 18 A Maximum Entropy calculation A.1 Recap on Convex optimization We start with a general summary of the convex optimization approach that ...

Showing first 80 references.