Compositional Deep Learning

Bruno Gavranovi\'c

arxiv: 1907.08292 · v1 · pith:SLNNXORWnew · submitted 2019-07-16 · 💻 cs.LG · cs.AI· math.CT

Compositional Deep Learning

Bruno Gavranovi\'c This is my paper

Pith reviewed 2026-05-24 20:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.CT

keywords category theoryfunctorsCycleGANunpaired learningimage-to-image translationobject insertioncompositional neural networksgradient descent

0 comments

The pith

Neural networks can be represented as learnable functors on categorical schemas rather than ordinary functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a category-theoretic formalism that treats networks like CycleGAN as closed under composition, with their interconnections specified by a schema and their instances as set-valued functors. Gradient descent is then used to learn these functors directly, which enforces composition invariants such as cycle-consistencies and thereby increases inductive bias. The same framework is used to design a new architecture whose task is object insertion and deletion in images from unpaired data, with tests on three datasets showing the approach is viable.

Core claim

By specifying network interconnections via a categorical schema and representing instances as set-valued functors on that schema, a special class of functors can be optimized with gradient descent; this yields a novel architecture that learns to insert and delete objects in images using only unpaired training data.

What carries the argument

Set-valued functors on a categorical schema that represent network instances while automatically enforcing composition invariants such as cycle-consistencies.

If this is right

Gradient descent becomes applicable to optimization over functorial rather than merely functional structures.
Cycle-consistency and other composition invariants can be guaranteed by the schema rather than added as separate loss terms.
Architectures for new unpaired tasks become constructible by choosing different schemas and functorial data migrations.
Datasets, models, and architectures themselves can be treated uniformly as objects in the same categorical setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same schema-plus-functor approach could be applied to sequence models or graph neural networks where composition order matters.
If the functors remain interpretable after training, the learned mappings might be inspected directly rather than only through input-output behavior.
Extending the schemas to include higher-dimensional categorical structures might allow joint learning of multiple related tasks without additional supervision.

Load-bearing premise

That encoding network structure as functors on a schema is sufficient to enforce the desired composition rules during gradient-based training.

What would settle it

A controlled experiment in which the functor-based architecture is trained on one of the three reported datasets and then evaluated on paired ground-truth object insertion or deletion; if accuracy or consistency metrics do not exceed those of a standard unpaired baseline, the claim that the categorical representation improves learning would be falsified.

read the original abstract

Neural networks have become an increasingly popular tool for solving many real-world problems. They are a general framework for differentiable optimization which includes many other machine learning approaches as special cases. In this thesis we build a category-theoretic formalism around a class of neural networks exemplified by CycleGAN. CycleGAN is a collection of neural networks, closed under composition, whose inductive bias is increased by enforcing composition invariants, i.e. cycle-consistencies. Inspired by Functorial Data Migration, we specify the interconnection of these networks using a categorical schema, and network instances as set-valued functors on this schema. We also frame neural network architectures, datasets, models, and a number of other concepts in a categorical setting and thus show a special class of functors, rather than functions, can be learned using gradient descent. We use the category-theoretic framework to conceive a novel neural network architecture whose goal is to learn the task of object insertion and object deletion in images with unpaired data. We test the architecture on three different datasets and obtain promising results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a category-theoretic formalism for CycleGAN-style neural networks by specifying their interconnections via a categorical schema and representing instances as set-valued functors. It claims that this setup allows a special class of functors to be learned using gradient descent, and uses the framework to propose a novel architecture for learning object insertion and deletion in images from unpaired data, reporting promising results on three datasets.

Significance. If the categorical approach provides an operational advantage in enforcing composition invariants like cycle-consistency through functoriality, rather than just a descriptive lens, the work could contribute to more structured inductive biases in generative adversarial networks and open avenues for category theory in deep learning architectures. The application to unpaired image editing is a practical test case.

major comments (2)

[Abstract] Abstract: the central claim that 'a special class of functors, rather than functions, can be learned using gradient descent' is not supported by any description of an explicit optimization procedure over the functor category, a proof that learned maps preserve composition, or an ablation isolating the schema's contribution versus standard cycle-consistency losses.
[Abstract] Abstract: no implementation details, baselines, specific metrics, error bars, or quantitative results are reported for the three datasets, so it is impossible to verify whether the 'promising results' substantiate the claims about functor learning or the novel architecture's performance.

minor comments (2)

The distinction between the proposed categorical framing and existing unpaired translation methods (e.g., CycleGAN) needs to be clarified to show what additional inductive bias the schema supplies.
Explicit definitions or diagrams of the categorical schema and the mapping from network instances to set-valued functors would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'a special class of functors, rather than functions, can be learned using gradient descent' is not supported by any description of an explicit optimization procedure over the functor category, a proof that learned maps preserve composition, or an ablation isolating the schema's contribution versus standard cycle-consistency losses.

Authors: We agree that the abstract phrasing is strong and that the manuscript does not supply an explicit optimization procedure defined directly over the functor category, a formal proof that the learned maps preserve composition beyond the functoriality of the schema, or an ablation study. The framework uses standard gradient descent on network weights, with the categorical schema used only to define which compositions must be cycle-consistent; functoriality is enforced by construction rather than learned as an additional constraint. We will revise the abstract to moderate the claim and add a clarifying paragraph in the main text explaining the relationship between the schema and ordinary back-propagation. An ablation isolating the schema is feasible and will be included if space allows. revision: partial
Referee: [Abstract] Abstract: no implementation details, baselines, specific metrics, error bars, or quantitative results are reported for the three datasets, so it is impossible to verify whether the 'promising results' substantiate the claims about functor learning or the novel architecture's performance.

Authors: The manuscript is primarily a theoretical contribution; the experimental section reports only qualitative outcomes on three datasets. We accept that the abstract and main text lack the requested implementation details, baselines, metrics, error bars, and quantitative tables. In the revision we will expand the experimental section with these elements, including comparisons against standard CycleGAN variants and reporting of standard metrics with error bars over multiple runs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; categorical framing applied to CycleGAN-style models without reducing claims to self-definition or fitted inputs.

full rationale

The abstract and framing describe specifying network interconnections via a categorical schema and representing instances as set-valued functors, then applying gradient descent to learn them. No equations or derivation steps are exhibited that equate a claimed result (e.g., functor learning) to its own inputs by construction, nor does any load-bearing premise collapse to a self-citation whose content is unverified. The reference to Functorial Data Migration is presented as inspiration rather than an internal uniqueness theorem or ansatz that forces the outcome. The central claim therefore remains an independent modeling choice whose validity rests on external empirical tests rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the application of category theory concepts to neural networks, with the new architecture being the main addition. No free parameters are mentioned, and the axioms are standard from the domain.

axioms (1)

standard math Standard axioms of category theory for defining schemas and functors
The framework relies on the established mathematical structure of categories, functors, and schemas from category theory.

invented entities (1)

Novel neural network architecture for object insertion and deletion no independent evidence
purpose: To perform the task of object insertion and deletion in images using unpaired data
The architecture is proposed in the paper as a new design based on the categorical framework, but no external validation or independent evidence is mentioned in the abstract.

pith-pipeline@v0.9.0 · 5697 in / 1359 out tokens · 29917 ms · 2026-05-24T20:58:25.797621+00:00 · methodology

Compositional Deep Learning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)